-->
  • February 5, 2025
  • FYI

Researchers Expand on-Device Speech Recognition

Article Featured Image

With one in four people around the world using speech recognition technology regularly and the number of speech-enabled devices in the billions, it’d be an understatement to say that speech recognition has gone mainstream. But there has been one major limitation: Speech recognition technology has traditionally relied on an internet connection because of the large amounts of temporary random access memory (RAM) that they require.

That could change, though, as researchers at the University of Copenhagen in Denmark and several European colleagues have developed an algorithm that eliminates the need for an internet connection when using speech recognition on small devices. This algorithm, created by Panagiotis Karras from the University of Copenhagen’s Department of Computer Science; linguist Athanasios Katsamanis of the Institute for Language and Speech Processing at the Athena Research Center in Greece; Martino Ciaperoni from Aalto University in Finland; and Aristides Gionis of the KTH Royal Institute of Technology in Sweden, allows small devices like smartphones to decode speech without substantial memory or internet access.

The algorithm is significantly different in that it forgets what it doesn’t need in real time, according to the researchers.

“Unlike the existing gold-standard algorithm [called Viterbi] used since speech recognition’s early days, our algorithm only stores a fraction of the processing data, serving as a set of coordinates. With these, an entire sequence can be reconstructed, which makes speech recognition possible with significantly less RAM,” Katsamanis explains in the research paper.

“Certain small devices can already recognize and act based upon a few words without internet connectivity. For example, a smart home system can recognize keywords such as ‘turn on’ or ‘turn off.’ This is known as small-vocabulary speech recognition. With our algorithm, it will be possible to recognize more extensive instructions or, in principle, entire languages, without an internet connection. This is referred to as large-vocabulary speech recognition,” Karras writes.

Speech recognition fundamentally works by matching phonemes—the smallest units of sound used to recognize and process linguistic expressions by matching spoken sounds with text—with a library of corresponding sounds, he explained. “Probabilities are calculated for matches and the subsequent combinations that go on to form our words and sentences. The most likely sequences are calculated and the software translates these sounds into text,” Karras adds.

Current common algorithms require increased memory the longer one speaks because alternative combinations must remain open until the final sound is analyzed. The new algorithm does away with this problem, according to Karras.

The researchers found that while their new algorithm does require slightly more time and computational power, they say the difference is negligible.

The researchers are currently seeking a patent for the new algorithm and its unique code. The hope among them is that the new algorithm could pave the way for speech recognition to be used anywhere, even in situations where security is paramount.

“This algorithm can help democratize language technology by making information more accessible. Making translation tools and speech assistants available regardless of internet access will allow more people to engage in society. In particular, it will help people without written language skills or those with physical disabilities, by enabling them to understand and influence societal decisions,” Katsamanis says in the paper.

And because it doesn’t need an internet connection, it significantly reduces the risk of data loss from an internet hack, the researchers assert.

Another benefit is the reduction in energy consumption that comes from the reduction in the enormous need for temporary memory storage, they also assert. “It is vital to reduce energy consumption to minimize reliance on fossil fuels, as many data centers still use these energy sources,” Karras states in the paper.

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues