SpeechTEK '97 Draws Corporate Buyers to New York
Continuous Speech Dictation Products and Improving Vertical Market Applications Are Highlights of Third Annual Exhibition and Conference. Two views of the event. SpeechTEK '97 showcased many new developments in the speech field and showed how the technology is advancing and moving into new markets. Following are two reports on the show. The first, by editor Brian Lewis, details the interest of corporate buyers in a variety of vertical markets. A second report from Peter Fleming and Robert Anderson discusses how many of the advances in speech dictation and recognition products were received. The dramatic improvements in continuous speech and the increasing number of speech applications available in vertical markets generated considerable interest on the floor of the SpeechTEK exhibition and conference held recently in New York. Much of the attention on the show floor was devoted to the continuous speech dictation product offerings of IBM (ViaVoice) and Dragon (Naturally Speaking.) Shortly before the SpeechTEK '97 show both companies released large vocabulary, continuous speech dictation products which observers regard as a significant step forward for the industry. The improvements in continuous speech, allowing users to speak naturally without pausing between each word, removes what has long been a stumbling block for speech recognition. Other products on display at SpeechTEK '97 also represented dramatic improvements compared to the first show in 1995, and even last year. Speech recognition is now easy to use and speech synthesis is natural sounding. The conference and exhibition attracted buyers from America's top corporations, including: Bell Atlantic, Charles Schwab, Chase Manhattan Bank, Citibank, Coca Cola, Compaq, General Motors, Merrill Lynch, Motorola, PaineWebber, Prudential, Texas Instruments and United Airlines. Nearly 2000 executives attended, an increase of 50% over the previous year. There was a similar increase in the number of exhibiting companies, which grew from 30 last year to 45 at the most recent SpeechTEK.
Vertical Markets
The industry has begun to reach into vertical markets in a way it has not in the past. There were specific real-world speech recognition applications on the floor for insurance, legal and medical fields. There was also considerable interest in the financial service applications, which were much more robust than in previous years. With improved noise cancellation and recognition, it has become easier than ever for people to buy stocks securely over the phone. Philips Speech Processing has long pursued a corporate policy of preparing speech applications in partnership with companies directly involved in the vertical markets, and the results were in evidence at the SpeechTEK '97 show. Hansjakob Schlaich of Lufthansa was able to show how the Philips SpeechMania was used in a Lufthansa Call Center, called ALF (Actual Lufthansa Flight-information). ALF provides information for arrival and departure times and gate locations of Lufthansa flights of the current day. The speech recognition and understanding features an active vocabulary of over 1,000 words, including 300 airports with 700 variations and 100 expressions for time. The Lorain, Ohio based law firm of Wickens, Hertzer & Panza uses several speech modules from Philips as well as U.S. MicroShare's litigation context to gain a competitive advantage. SpeechNote, a professional dictation and transcription package which operates in a Windows environment, SpeechMagic, natural speech recognition software which processes speech input and delivers text output, and SpeechFlow, a PC LAN-based digital dictation product with an advanced workflow management system are used at the law office. Philips also has a similar system in use at several health care facilities where the citation system makes effective use of "trigger words" in which saying a specific word will generate a complete medical definition of a symptom or illness.
Some Additional Highlights There were many other indications of the speech industry's growth and acceptance:
- Andrea Electronics Corp. introduced its first wireless, infra-red PC headset with patented Active Noise Cancellation microphone technology.
- Keyware Technologies launched the telephone application for VoiceGuardian, the company's voice verification software solution for accurate, non-intrusive authentication of individuals.
- Spanish self-study from Syracuse Language Systems exhibited the only language learning software program to use IBM ViaVoice speech recognition, which responds to the user like a real listener, permitting a natural vocal style.
- Nuance Communications and three of their largest customers, Charles Schwab, American Express and British Airways, demonstrated how overall customer service increased through the use of over-the-phone speech recognition and natural language understanding technology in call centers.
New Developments in Speech: A visitor's view of Speech TEK '97 Features Improved Dictation, Recognition Products By Peter Fleming and Robert Anderson The third annual Speech Tek '97 show held recently in New York was busy, intense and highlighted by new developments, many of which Jo Lernout discussed in his keynote speech. The founder of Lernout & Hauspie outlined his company's attempt to develop more accurate speech recognition methods. He also predicted that future machine synthesis of language would be so lifelike that users could not distinguish between recorded speech and machine synthesized speech.
Microsoft's Investment Much of the discussion after the keynote and on the floor of the show dealt with the recent investment of $45 million by Microsoft into Lernout & Hauspie. It is generally seen as a significant step towards the widespread acceptance of speech. The improving sales of speech recognition products as well as the more widespread interest and excitement are other indicators that speech technology is maturing. The phenomenal leaps in performance by continuous recognizers in general English were shown by Dragon Systems and IBM. We reviewed these products in the previous issue, and elsewhere in the current issue discuss how they respond to training. Dragon exhibited a new Deluxe version of their NaturallySpeaking continuous general English speech recognition system which has a number of new important features compared with the initial release. First, it allows multiple users, each with their own voice file. More importantly, the recording of the speaker's voice occurs behind the recognized words so that one may play back a recording of what one has said. This allows more accurate correction.
Synthesizers A French based company, Elan, demonstrated its speech synthesis products available in multiple languages including American English, although the company reports they are still developing a more robust American English accent for future speech synthesis products. IBM's ViaVoice continues to use the speech synthesizer by Eloquent, which features multiple speaker voices and the ability to adjust pitch and speed. Within Windows 95, one may drag-and-drop any document onto the speech synthesis icon, and have it read back a document. Users have mixed impressions of the synthesized voice. One reviewer found the voice to be quite clear and charming, while another found it to be useless and objectionable. The Dragon Deluxe edition will also allow the user to create their own macros, such as macro which enters dictated material into different fields of a database form. IBM's ViaVoice was demonstrated by many developers who had incorporated it into interesting applications. There are rumors that IBM plans to release a ViaVoice Gold product with features such as greater navigational features, more command and control and the ability to dictate, spell, or correct by voice within the correction dialogue box. A very important feature of the current IBM ViaVoice is the deferred correction paradigm, which allows anyone to correct the dictated material at a future time. This feature makes the product attractive in settings such as law firms and hospitals, where deferred correction is routinely performed by other staff. Philips demonstrated several of its continuous dictation modules for specialized vocabulary in markets such as law, medicine, and mental health. The Philips dictation systems allow multiple users on a server to dictate documents into a central source which can then be transmitted to transcriptionists. Screens allow the dictator and transcriber to view the progress of the document and thus obtain quick feedback on the status of the production process. The initial recognition of the material by computer allows the transcriptionist to work faster and produce more final text. Manual transcription remains an option within the Philip's system. Philips allows partners to develop specialized dictation modules in their particular area working through a partners alliance program. People have reported good recognition results with the Philips handheld digital recorder. Lernout & Hauspie spoke about their speech synthesis capability with the addition of Berkeley Systems to their family, as well as the company's linguistic background in Belgium with its many languages. We expect that Microsoft may try to utilize these languages in multilingual editions of its software products. Kurzweil/Lernout & Hauspie has shipped their command and control product which allows detailed editing within Microsoft Word. This voice product uses natural language allowing one to say particular editing commands within Microsoft Word in many ways. One can edit quickly and easily within the system by voice. In addition, Verbex's Listen for Windows, which has been out for a number of years, also allows continuous dictation of editing commands. Ultimately, one tends to focus on a single way of doing things. Editing and command and control commands will no doubt be incorporated into continuous speech recognition programs using more natural language features allowing the user to say the commands in natural and alternative ways. Users are advised to try products and to mix and match to find the best possible combination for their work. Different products tend to match different users needs. Certainly the production of first drafts by continuous dictation speech recognition is a great step forward.
Switchers A growing area is telephony products where computers are used for voice mail messaging and sophisticated related functions. Another group of products this year related to telephone and speech recognition were the switchers exhibited by Plantronics and VXI and anticipated from Andrea Electronics. The Plantronics switcher allows the user with the single flip of a button, to switch back and forth between speaking on the telephone and dictating to the computer, allowing someone on the phone therefore to make speech recognition voice notes on the computer while engaged in a telephone conversation without the person on the other end of the phone hearing what is dictated to the computer. VXI has a competing product which has an added feature. Pushing or releasing a the button is entirely silent so the person on the phone is unaware of your actions.
Wireless Microphones Andrea and Shure both exhibited headset radio microphones allowing the user to dictate while moving about. The Shure wireless microphone was a one-way microphone. Andrea plans a wireless microphone which also has a receiver feedback and is therefore two-way. We had the opportunity at Speech TEK to see many types of headsets and handheld microphones including products from Shure, Andrea, VXI and elements from Gentex which are used by VXI and in some of Plantronics equipment as well. The advent of the switcher between the computer microphone and the telephone, while so simple, lends a new dimension of functionality to speech recognition, allowing speech recognition data input, database entry, and note taking while on the telephone. They are already being used by some large corporations and mail-order houses.
Teaching Language Another new application of speech recognition is language teaching, either foreign language or in speech therapy. One new product in foreign language teaching was presented by Syracuse Language Systems, with one of the first foreign language teaching products using continuous speech recognition. This inexpensive teaching module with software on CD-ROMs and audio cassette tapes uses IBM ViaVoice technology to teach the user to speak Spanish. Syracuse Language also offers language instruction over the Internet, with human language instructors to whom one can send one's recorded speech files to be critiqued, and have interactions with teachers which in the future may include telephone contact. This method of language teaching is far more cost-effective than individual instruction. A similar method of instruction in speech language pathology has been designed using a Unisys system for the speech language rehabilitation of stroke victims, in which patients have difficulty producing correct language. Such speech therapy programs allow the user hundreds of hours of low-cost therapy which can supplement or replace hands-on live language therapy. One can imagine therapy programs for dyslexia and other language disorders by computer which may be a significant new area for speech recognition. Maybe we'll see it next year at Speech TEK '98. The show is scheduled for next year on Oct. 27 and 28, again at the New York Hilton & Towers. For more information, contact Scott C. Temple, show manager at (203) 834-1122; or by e-mail at
stemple@comtekexpo.com.
Peter Fleming and Robert Andersen, speech recognition consultants, may be reached at aris@world.std.com or (617) 923-9356.
Companies and Suppliers Mentioned