March 1, 2008
By Ryan Joe
FYI

Overheard/Underheard

^{The speech-related developments spotlighted in this column don’t quite warrant a full news story, but they’re still too eccentric for us to pass up.}

>>>>> In November 2007, the National Endowment for the Arts released a report confirming what everybody already knows: Americans don’t read. Fortunately, Chinese manufacturer aigo released the aigopen, a device roughly the size of a large felt-tip marker that scans a page of text and reads the content out loud in either Chinese or English. Now Americans never have to read again! The bad news: It only works on books with digitally watermarked paper.
Despite its limitations, there’s still great promise to the technology. Aigo announced that it became an official partner with the Olympic Museum’s traveling exhibit. Anticipating the 2008 Beijing Olympics, aigo has published a city guide to accommodate the pen. The company anticipates the pen will become an effective tourist companion, helping to ease culture shock and facilitate greater interaction with foreign, digitally watermarked environments.

>>>>> Computerized lip-reading isn’t the most obvious use of speech technology, probably because by definition, lip reading doesn’t involve a lot of audio information. However, researchers at the University of East Anglia in the United Kingdom are leveraging known information about audio speech to recognize visual speech. The connection, according to the abstract from the UEA’s School of Computing Sciences and the Ministry of Defence, is the inherent multimodality of speech.
In their clean, British syntax: "Generally, under ideal conditions the visual information is redundant and speech perception is dominated by audition, but increasing significance is placed on vision as the auditory signal degrades." So far, the technology’s accuracy ratings vary.
Researchers say computerized lip-reading will benefit counterterrorism and law enforcement units.
It’s speech technology
without the sound.

>>>>> And speaking of criminals, rapper Prodigy is in jail for his third gun charge. In the meantime, his album H.N.I.C. Part 2 will drop this month. It’s the perfect music to accompany Speech Technology magazine. If you want, you can even listen to Prodigy rap in Spanish, thanks to Voxonic’s proprietary voice-conversion technology. At press time, the company was negotiating to translate the lyrics into German, French, and Italian.
A Wired article detailed the technology’s process: Lyrics are hand-translated into a given language and rerecorded by a professional speaker. Voxonic’s technology creates a voice model by ripping phonemes from the original recording and layering them over the professional speaker’s translated lyrics. Hence, new sound; same smooth, gangsta flava.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Overheard/Underheard

Hona Launches Voice AI

AI Virtual Assistants Market to Hit $2.45 Billion by 2030

SoundHound AI Delivers Voice Assistants at Scale with NVIDIA

Kardome Mobility Now Available on NVIDIA AGX Platform