Mozilla Announces Open Source Common Voice Speech Recognition Datasets
Mozilla has announced an expansion to its crowd-sourced Common Voice project. The Common Voice Project, which is just about a year old, is creating an open source voice-recognition dataset. Now the project is opening up to include more languages. Mozilla wants volunteers from across the globe to record short bits of text with their voice through a web or mobile app.
According to VentureBeat, “Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepped for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.”