The Brain-Computer Interface: The Frontier of Speech Tech
Automatic speech recognition is all about processing words that are spoken. But what about unarticulated speech? You might have seen the assistive technologies used by physicist Stephen Hawking—who had ALS, a paralyzing neuro-muscular disease—to communicate his brilliant ideas to the world. Several million people worldwide lose the ability to articulate speech because of causes such as strokes or conditions like ALS.
Artificial intelligence and advances in developing a brain-computer interface (BCI) offer a ray of hope for developing new assistive technologies for people who can no longer articulate speech.
The objective of BCI, which is also referred to as brain-machine interface, is quite simple: decode the signals in our brains. A computing device connected to the brain collects input signals and then tries to decipher the information from those signals and use that knowledge to operate hardware or control software in the physical world. BCI has potential applications in home automation, gaming, and robotics; in healthcare, it can be used to build assistive technologies for patients. Speech BCI is when a user’s brain signals are used to generate (or reconstruct some form of) speech.
Speech BCI can be used to control home devices, talk to digital assistants, and navigate apps and websites. It broadly consists of two categories: non-invasive recording devices and implants. For several years, implants were considered too risky, and medical professionals used them only in very rare cases. But now BCI devices are improving—implants may last longer and provide benefits to patients over a longer term—and could find greater use as an assistive technology.
Using neuroimaging studies, medical and speech researchers now have a fairly good understanding of how speech is represented in the brain. Combining this knowledge with AI techniques leads to new possibilities—a tool that can translate speech imagined in the brain but not vocalized due to some impairment. Based on signals/patterns detected in different areas of the cerebral cortex, AI models are used to detect and classify words. Then a natural language processing model is used to figure out the most probable “next words” given the previous sequence of words (somewhat similar to the predictive text on smartphones but under different constraints and requirements). The end result is a prediction of the full sentence as the user attempts to vocalize the words.
This can be a significant improvement over the assistive technologies used by Stephen Hawking, where the sentences were spelled out letter-by-letter (using eye movement tracking to control the computer cursor). This is cumbersome compared to decoding full sentences, and researchers are exploring the latter approach for users whose cognition and language skills are intact after a stroke but whose oral movements are restricted and who aren’t able to produce intelligible sounds.
In one study, researchers have proven that words and sentences can be decoded using AI from electrical activity in the brain; they were able to decode 15 words per minute with a 25 percent word error rate. This is a significant improvement and raises hope for the future.
Such approaches have definitely shown promise when studies were conducted in people without speech impairment. But to train the AI models accurately, we need training data, and it’s a challenge to accurately map brain activity to speech because of the absence of datasets. Data collection requires a neural implant be placed, and if data is being collected from a user without impairment, the question remains if that data is valid for users with impairments.
A few other issues also need to be thought through. Can we distinguish between internal monologues and actual speech (i.e., speech that a user does not intend to vocalize versus speech that a user is not able to vocalize because of an impairment)? Perhaps an approach that uses the signals present in the brain during the effort to move the vocal apparatus might be able to make such distinctions.
The accuracy of decoded speech must be very high for it to be useful, as users might not be in a position to easily provide feedback. Also, careful thought should go into the design of assistive technology so that the AI is transparent and secure. Users should have the ability to turn on, turn off, and override or veto the BCI output so that they preserve their agency in communication. Speech BCI is indeed an exciting and important frontier of speech technology and a great example of #AIforGood.
Kashyap Kompella is CEO of rpa2ai Research, a global AI industry analyst firm, and co-author of Practical Artificial Intelligence: An Enterprise Playbook.