The pins are connected directly to the GPIO pins on the RPi. The panel requires 12 digital pins, 6 pins for color data and 6 pins for control, and a 5V power supply. The LED panel we used is the Medium 16x32 RGB LED matrix panel produced by Adafruit. This helps in appealing to a broader audience, in contrast with current mechanisms such as Amazon Mechanical Turk that impose a huge cognitive load on contributors. Our goal is to create a diverse and high quality voice to text training corpus for ASR by engaging users in a fun task such as karaoke singing in languages that have musical content with the corresponding lyrics but an immature ASR technology. The voice recording would be then uploaded to the cloud along with other user and song metadata, which could be used to further train an automatic speech recognition (ASR) deep learning model on the cloud. The singing is recorded with the microphone and after the song is finished, the singing is scored based on its correlation and consistency with the playback. The user can listen to the music from the earphone and sing to the microphone. After a user chooses a song on the GUI displayed on the piTFT, the music video is played on the piTFT and the real-time frequency spectrum of the music playback is displayed on the LED panel. A PiTFT screen, a LED panel, a microphone, and an earphone are the primary components of the Karaoke System. To this end, we propose to create a Raspberry Pi karaoke system that displays song lyrics on the piTFT screen and records speech input from users listening to the karaoke track through an earphone connected to the Pi. However, data collection is an expensive task, and we must invent ways to continuously collect and refine training data, while minimizing the cognitive load on people contributing to training data for AI models. A major barrier to the development of artificial intelligence technology in all the other languages is the lack of training data. Unfortunately, a lot of artificial intelligence technology today exists only in major languages like English, Spanish, Mandarin (Chinese), etc. There are around 6500 languages in the world. Figure 1 General View of the System Introduction By building a simple and easy-to-use karaoke system, we can crowdsource a speech-to-text corpus that can be used to train AI models, e.g., speech recognition.
Artificial intelligence (AI) technologies such as speech recognition, which are a key to this transformation, require a diverse training corpus such as speech recordings, which aren't widely available for many world languages. Jiaqi Sun (js3599) and Pranav Gupta (ppg22) ObjectivesĪs internet services are increasingly becoming available to all regions of the world, we face an unprecedented task of overcoming barriers such as language, so as to provide cutting edge quality content to each person on the planet.