IBMs New Toolkit Connects with a Family of Developers
The goal of speaking to your computer, and having it do what you say, has been a goal of IBMs speech recognition research team for over 25 years.
IBM research has long believed that the next jump in computer usage and productivity will be caused by an interface revolution. In pursuing that goal, IBM has chosen to focus almost entirely on a statistical approach to the speech recognition problem.
Under this method, the computer is instructed to listen for specific linguistic patterns and to arrive at statistical correlations between sounds and the words they represent. The system makes an educated guess as to what is being said, based on a statistical database of previously spoken material.
This effort was hampered by existing hardware processing speeds, which were just not fast enough for the huge amounts of computation needed for speech recognition.
Driven by Defense Department funding in the 1960s and 1970s, research continued, allowing for new, faster processing methods to be developed. During this time, IBM focused on developing methods of statistical "context" for the computer. This allowed speech systems to anticipate what sound may possibly come next.
The next step was to turn the understanding of sounds into the understanding of words. A technique called Dynamic Programming allowed IBM to "time warp" the spoken input, enabling the system to meld the speech sample into something it could understand.
By the end of the 1970s, IBM achieved a speaker dependent, discrete speech program with a vocabulary of about 1,000 words. With the unit trained to a specific speaker, the system could achieve 85% accuracy. This achievement may seem primitive now, but at the time it was the highest mark for recognition technology.
In 1984, IBM was able to deliver the worlds first real demonstration of speech recognition, showing a system that could take discrete dictation from a speaker who had been "trained" into the system and achieved 95% recognition accuracy.
The goal now was to produce a real-world application.
New processors allowed faster computational speeds, and the ability of users to add new words to the system was also added. As a result, the system could actually learn from past errors.
In 1992, IBM introduced its first dictation system and the following year launched the personal dictation system for OS/2.
Earlier this year, IBM launched a new release of this product called VoiceType 3.0, which does not need an adapter card. VoiceType 3.0 also delivers isolated speech dictation and continuous command and control without the need for training. With a simple vocal command, users will be able to activate applications or even browse the Internet by saying where they would like to go.
IBM also has released a toolkit which allows application developers to make tomorrows high performance software "speech aware." IBM recently announced its Voice Type Developers Toolkit Version 3.0 for Windows 95, enabling IBM VoiceType speech technology to be tied into new or existing applications.
SpeechTEK
At the recent SpeechTEK show in New York, IBM and many of the developers using the toolkit were on hand to discuss the kit and how it can help developers integrate speech directly into applications as they are developed.
"We are looking at this as a marriage of IBMs 25 years of speech research and the needs of the developers," said Anne Marie Derouault, worldwide speech marketing and sales executive for IBM.
"One of the toughest challenges developers face today is the ability to access the basic resources required to translate exciting technology into a business opportunity," said Jan Winston, worldwide manager of IBM Speech Systems. "The toolkit will help developers achieve this goal by giving them the fundamental tools and information they need to build powerful speech technology into traditional or next generation applications."
The toolkit is designed to support the development of both continuous and discrete dictation applications. Developers are currently working with IBM to deliver a wide range of applications, including interactive games, productivity tools, virtual secretaries, personal information managers, educational programs and Internet applications.
One of the most exciting applications features voice enabled E-mail, developed by Typhoon Software, of Santa Barbara, Calif.
"We are very excited by the potential of productivity enhancements of this VoiceType-based third-party solution for our Group Wise users," said Novells Richard Weir, chief of staff, corporate technology office, in reference to Novells unique joint effort with IBM and Typhoon on a SRAPI-enabled speech technology.
The Typhoon product will voice-enable GroupWise software through a simple and inexpensive add-on, said Philip Myers, Typhoon president and CEO. Typhoon works with American project managers and Russian computer programmers to provide software solutions for companies doing business in the former Soviet Union.
Pact with Eloquent
IBM and Eloquent Technology Inc. (ETI) have reached an agreement under which they will work closely to integrate Eloquents text-to-speech functions into future IBM products and applications that are part of the IBM VoiceType family.
ETI will continue to license and support its toolkit product under the ETI-Eloquence trademark, but now cooperatively with IBM.
"We are very pleased that IBM recognized the potential of our technology," said Sue Hertz, president of Eloquent Technology, Inc. "This agreement will make it easier for developers to create a broad variety of speech-enabled products, and will provide users with access to many interactive applications that take advantage of a combined speech-to-text/text-to-speech product and toolkit."
IBM VoiceType Partners
AandG Graphics Interface, Inc., a software and consulting company in Cambridge, MA has released CustomVoice, which works with the VoiceType 3.0 engine and delivers to the customer a set of rapid application prototyping and development tools for creating 32 bit applications. CustomVoice comes packaged with a number of tools crucial to rapid prototyping and development.
Android Technologies Inc. specializes in the development of custom voice recognition applications for use by medium to large corporations, or for applications targeted for mass market distribution.
Bugz Software Inc. of Toronto Canada is planning to release a CD-ROM title which incorporates IBMs VoiceType speech recognition software, and features a selection of 10 interactive comedic characters called Chipheads.
Chant Inc., of Marina Del Ray, CA uses the IBM speech engine for Smalltalk, a tool for rapidly developing applications that allows developers to use speech for programming applications. The Smalltalk development environment is an object-oriented alternative to programming applications in C and C+.
Courseware Publishing International of Cupertino Calif., producers of interactive multimedia courseware has reached licensing agreements with the Infogrames Entertainment Group in France and Atica Multimedia in Brazil for two new English-as-a-second-language titles using IBM VoiceType speech recognition.
DelRey Software of Irving, TX has formed a consulting and development service in support of IBM VoiceType Dictation, acting as sub-contractors to write codes for speech aware products.
ENW International of St. Michaels, MD uses IBM VoiceType 3.0 to allow speech-aware features on EROICA (Efficient Resource for Output, Input, Calculation and Analysis) SpeechDialer, a utility for rapid telephone dialing from anywhere on the Windows desktop.
ISMRA, based at the University of Caen, in France, has implemented a conversational agent to navigate into MRI (Magnetic Resonance Imaging) brain images. ISMRA uses reconstructed surface images and linguistic modules of syntax and dialogue together with IBM VoiceType APIs to move around the brain and query specific regions (language, vision, smell, etc.) about their functions.
NCC, Inc. of Scottsdale, AZ, uses IBM voice recognition technology to accurately transcribe recorded dictation into text for the DD1200/Digital Dictate, a hand-held lightweight, portable digital recorder for use with Digital Dictate 3.0 software.
Speech Solutions, Inc., of Philadelphia, PA uses VoiceType with the Speech Solutions Voice Tools ActiveX custom controls, a unique set of custom controls which enable Windows programmers to make full use of VoiceTypes free dictation capabilities, as well as voice command and control capabilities, through any web browsers supporting the ActiveX standard. These tools eliminate the need for programming at the API level. Developers can just drag and drop speech recognition directly into a custom application. The speaker-independent voice recognition product can be used right out of the box with no training and because the IBM system works with a SoundBlaster (or compatible) sound card, there is no need for an additional voice card in the computer.
Speech Solutions recently changed its name from ProNotes. The company is a manufacturer of dictation systems for medical professionals and has been developing software for IBM VoiceType Speech Recognition since 1993.
Speech Technologies Inc. of Naperville, IL, uses the IBM VoiceType speech recognition technology in Speech Recognition Expert, a new toolkit which includes both command/control and dictation custom controls.
Voice Pilot Technologies, of Miami, FL uses the IBM speech engine in VoicePilot Version 3.0, which the company bills as the first PC-based application which is totally speech-aware, hands-free and able to perform any function without a keyboard or a mouse.
Voice Pilot won the Comdex "Best of Show" award for their Windows 95 version, awarded by Byte magazine.
VoiceQuest of Sarasota, FL., uses IBMs speech engine with the Verbal Information Exchange Systems (VIES), to compose and create interactive human-like language models that can exchange information with the ease of speech. VIES combines complex speech and grammar into a point and click database allowing users to develop interactive, speech aware applications. VoiceQuest also makes Nicky, an automated receptionist that allows interactive communication using speech recognition.