Accent Detection AI Promises to Improve Speech Recognition
Speech technology professionals have long agreed that accents are one of the main hurdles to better speech recognition. But now, researchers at Cisco, the Moscow Institute of Physics and Technology, and the Higher School of Economics are investigating a possible solution. In Foreign English Accent Adjustment by Learning Phonetic Patterns researchers reveal how their system leveraged dialectical differences in diction and intonation to create new accented samples of words, which it learned to recognize with some accuracy compared to similar systems.
According to Venture Beat, “the team sourced its data from the Carnegie Mellon University (CMU) Pronouncing Dictionary, which contains thousands of audio recordings of English speakers reading common words. Traditionally, when training a system on a new accent, phonologists have to manually extract features known as phonological generalizations to represent the difference between General American English (GAE) — spoken English lacking distinctly regional or ethnic characteristics — and an audio sample of a distinct accent, but that sort of hard-coding doesn’t tend to scale well.”Now, though, the researchers’ model can generalize those rules automatically.
The researchers combined information from George Mason University’s Speech Accent Archive and unique sounds from CMU. Ultimately the researchers’system was able to predict pronunciations by making replacements, deletions, and insertions to input words.
VentureBeat reports, “The team used the model to generate a phonetic dataset they fed into a recurrent neural network — a type of neural network commonly employed in speech recognition tasks — that tried to get rid of unnecessary sounds and change them so that they didn’t deviate too far from the GAE word versions. After training on 800,000 samples, it was able to recognize accented words with 59% accuracy.”
Because the CMU Dictionary contained fewer sounds than the GMU, the model not able to able to learn all of the CMU’s phonetic generalizations—only 13 out of 20. The team reports it has managed to increase the size of the CMU dataset from 103,000 phonetic transcriptions with a single accent to one million samples with multiple accents.