Question: Why do you emphasize the Telephony Voice User Interface rather than simply telephone speech recognition?
In the report, I do talk in detail about applications of telephone speech recognition on an application-by-application basis. [Editor's note: See Table 1.] All of the product categories stand on their own, but I also wanted to emphasize the synergy between them, the broader trend.
Table 1: Telephone speech recognition product categories used in the Telephony Voice User Interface report
Technology licensing: The business of licensing basic recognition technology for use by others in creating products. Digit recognition: Recognition of the digits and a few control words. Voice dialing and control: Dialing by speaking a name, using a personal directory, and sometimes by number. Voice-activated auto attendant: Directs calls when the caller speaks the name or department desired. Voice-activated voice mail & unified messaging: Controls voice mail or unified messaging with spoken commands. Telecommunications assistant: An integrated set of applications controlled by a consistent voice interface. Features usually include voice-activated voice mail, call screening, voice dialing, and one-number call-forwarding services. IVR improvement: Customized interactions that provide limited call direction and/or information. Improves a touch-tone interface. Call center automation: Using speech recognition to partially or fully automate a call-center application. (Distinguished from IVR improvement by performing activities usually requiring a customer service representative.) Operator Services: Automating some of the more routine aspects of operator duties by telephone service providers. Directory assistance: Automatically providing telephone listings, including responses to "what city?" or automating frequently called listings. Commercial directories: Voice-accessible "yellow pages," an automated directory service for businesses. Specialized speech recognition products: Applications that do not fit other categories and that are not established as a full product category. This includes, for example, Internet access by phone (calling by phone to navigate the Internet by voice). Personal telephony products: Telephony speech recognition which runs on a PC and which typically deals with the telephone lines through a single-line voice modem. |
The big step forward-the paradigm shift-is the huge improvement in the user interface of the telephone. The existing user interface for telephone automation is the touch-tone pad, a very limited input device.
Speech recognition makes possible the automation of many tasks that couldn't be automated before. The cost of making telephone calls will be dropping rapidly because of competition and Internet telephony; the cost of handling those calls at the receiving end can drop as well.
I referred to this change as a paradigm shift because callers will come to regard the telephone differently. It will be treated more like an electronic personal assistant than just a means of connecting to another person.
Many of the things that make the telephone annoying can go away. You won't have to remember the numbers of frequently-called people-just say their names. You can be connected to someone at a company by saying their name rather than remembering an extension or having to spell the name on a keypad. You won't have to wait for long periods on hold to talk to a customer service representative. Tortuous, touch-tone-driven decision trees will go away. You will be able to get information you need immediately by picking up the telephone and asking for it. The whole user experience changes.
Question: Is current telephone speech recognition technology up to the task?
Yes. The rapid growth in deployed applications of high complexity has answered that question. Stock quotes on all 13,000 stocks and over 100,000 options are available over the telephone from Charles Schwab & Co. E*Trade has a system providing over-the-phone stock trading, in addition to quotes. Several airlines and travel companies are operating voice-driven travel information and reservation systems, although they currently serve a limited customer base (e.g., employees). United Parcel Service is recognizing long alphanumeric strings over the telephone to give package-tracking information. Voice-activated automated attendant systems are handling calls to thousands of employees at places like the Boston Globe. Directory assistance costs are being reduced using speech recognition by companies such as Bell Canada, Bell Atlantic, and US West.
These complex applications show how far speech recognition over the telephone can be taken. If these complex applications work, then applications with smaller vocabularies-applications like voice dialing, digit recognition, IVR automation, and operator-services automation-are even less limited by performance of the technology.
And costs have dropped rapidly, both for the basic technology and for the hardware required to use that technology. Cost is now less of an issue.
Question: What role do text-to-speech and speaker verification play?
Both technologies add dimensions to the Telephony Voice User interface. Text-to-speech allows cost-effective reading of material stored as text over the telephone; it is a key part of systems that read e-mail over the telephone. Speaker verification can make telephone transactions more secure. Verification has applications where it stands alone, and it can support speech recognition by making sensitive telephone transactions more secure.
Question: How can telephone companies take advantage of this?
Microsoft has demonstrated in PCs the power that control of the user interface provides. Their Windows Graphical User Interface (GUI) is what most buyers see as the PC.
In telephony, the Voice User Interface (VUI) plays that role. Customers will come to identify their telephone service with their "electronic assistant." They will invest time learning a specific VUI and entering information such as names for voice dialing and feature preferences. Telephone companies will be able to sell their service on the basis of features rather than purely price. Customers will be less likely to change services if they face learning another VUI and re-entering information.
In addition to keeping customers, the VUI supports the addition of services without overwhelming the customer. Optional services can generate additional revenue, much like cable companies get increased revenues by selling premium channels.
Question: In that case, why don't we see more voice-enabled services by the telephone companies?
The leading telephone service companies have been slow to test speech recognition, usually trying a limited application such as voice dialing, with the expectation of building on that application's success. The danger of an incremental approach is that the first increment may not be enough to entice the customer to adopt a new user interface. To create a true base for growth, the first increment of voice-driven services must be of substantial value to customers. I'm not convinced that voice dialing alone is enough.
Some independent calling-card companies are trying to build their business by providing broader voice-enabled services. If the larger companies are too cautious, these energetic independents may capture a large share of the VUI market. They may become the customer's gateway to the telephone network. In the extreme case, the major telecommunications companies could lose control of the customer and become commodity "bit pipelines."
Question: What about organizations other than telephone service providers? What does this voice user interface mean to them?
Corporations and other organizations are looking at telephone speech recognition first as a way to improve the efficiency of call centers. This leads to direct cost savings in operations, and to indirect improvements in revenues as customers are better served.
Corporations can also use speech recognition to improve customer service. Speech recognition can let callers get information or get in touch with the right people more easily. Speech recognition can also make it more cost-effective to sell items over the telephone.
Companies now regard the Internet as an efficient way to provide centralized information to customers through a home page. I believe companies will also develop "voice home pages" accessible using speech recognition over the telephone. The telephone is considerably more available and faster than Internet access for most people.
Corporations are also looking at speech recognition as a means of improving the efficiency of employees. Some voice-activated auto attendants, for example, are used to make it easier for employees to contact each other. Telecommunications assistants help mobile employees service customers better. In particular, a good telecommunications assistant can make an individual or a small company look much larger.
Question: What about telephone equipment manufacturers? Are they in position to take advantage of this change?
Equipment vendors, such as sellers of interactive voice response systems or computer telephony platforms, tell me that the availability of speech recognition on their platforms is already becoming an important customer criterion for purchase. Selling a speech-recognition option provides opportunities for incremental revenues. More fundamentally, speech recognition will increase automated services over the telephone, increasing the number of applications of telephone equipment.
Question: Do you see opportunities for new businesses to be created?
Interactive touch-tone equipment created businesses and services. Speech recognition creates new opportunities that imaginative entrepreneurs will explore. One new company recently launched a service to test English proficiency automatically over the telephone using speech recognition, for example.
Question: How fast will telephone speech recognition grow?
I forecast a hundred-fold growth between 1997 and 2003 in equipment and service revenues that would not exist except for speech recognition. [See Table 2.]
Table 2: Revenues from advanced speech technology products and services in telephony, Worldwide ($ million) |
|
1996 |
1997 |
1998 |
1999 |
2000 |
2001 |
2002 |
2003 |
Speech recognition |
235 |
408 |
970 |
2,093 |
4,899 |
11,623 |
21,496 |
36,822 |
Speaker verification |
3 |
10 |
27 |
77 |
165 |
284 |
461 |
747 |
Text-to-Speech |
58 |
82 |
131 |
231 |
292 |
306 |
385 |
501 |
Total- |
296 |
500 |
1,128 |
2,401 |
5,356 |
12,213 |
22,342 |
38,070 |
Source: "The Telephony Voice User Interface," TMA Associates (Tarzana, CA). |
Question: That's a remarkable growth rate-can you give some examples of what drives it?
I admit that I felt uncomfortable predicting such explosive growth. Part of the fast growth results from starting from a small base-speech recognition, is just beginning to become important in telephony. But most of the growth results from opportunities that seemed to me to be inexorable. The growth comes from looking at specific opportunities and how they can grow, considering both the potential market and barriers to reaching that potential.
For example, it is hard for me to believe that voice-activated automated attendants will not be the way most business phones are eventually answered. It is a huge improvement over needing to know a direct line or remember an extension, or trying to spell a name on the keypad. I believe there will be substantial progress toward this end by 2003.
And there is synergy. The more voice-activated services there are, the more callers come to understand how to react to a speech-recognition system. The more successes callers have with speech recognition, the more they will come to expect it. Companies will no longer perceive that adding speech recognition carries a risk of customer rejection.
One speech-activated service leads to demands for more features, features that can be added with a voice user interface. Voice-dialing services may expand to telecommunications assistants. Voice-activated auto attendants will add product directories. A true paradigm shift means that the growth is limited only by the imagination of product and service developers.
William S. Meisel, Ph.D., is president of TMA Associates, a consulting and publishing firm. He is the publisher and editor of Speech Recognition Update newsletter.
Meisel has over 20 years of experience in speech recognition, including founding and running a speech-recognition company for ten years. Meisel obtained his BS degree from the California Institute of Technology (Caltech) and his MS and Ph.D. degrees in Electrical Engineering from the University of Southern California. Meisel can be contacted at TMA Associates, (818)708-0962 or tmainfo@tmaa.com.