Virtual Assistants: Speech's next "Killer App"
The bar for speech interactive tools may have been set artificially high by years of science fiction movies such as 2001 and StarTrek, but these hugely popular vehicles have captured the publics imagination, and raised their expectations. The publics high-performance expectations for interactive speech tools have essentially been unattainable
until recently. Improving speech recognition accuracy alone is not enough to reach such a lofty goal. The speech industry needs a killer application that will legitimize the use of speech with the masses. Several companies are betting heavily that speech-controlled Virtual Assistants will be that killer app. So what is a "Virtual Assistant?" Its a speech-based interactive software agent that possesses anthropomorphic (human-like) qualities and behavior whose purpose is to perform the basic telephone and electronic tasks of a good personal assistant or secretary. Virtual Assistants are, first and foremost, phone management tools, thus finding their roots in the telephone environment. Secondarily, they are intended to manage many data and unified messaging tasks. How often do you actually connect with your intended target on the first attempt? In 1995 AT&T determined that 75% of all business calls reached voice mail (which, of course, pleased the carriers since it required an additional call.) We have shifted our voice communications from the synchronous model of connecting live, to the asynchronous model of passive messaging, driven primarily by the advent of voice mail. Some companies are realizing that there is a large cost associated with this "store and forward" messaging process. It is the cost of the actual elapsed time spent completing communication transactions (in person/hours) combined with the cost of lost opportunities caused by fragmented communication and its associated delays. For companies engaged in high value transactions (such as commercial real estate brokers or investment bankers) this cost more than justifies exploring new technology solutions. In a sense, virtual assistants offer the opportunity to hire back dedicated assistants to streamline and coordinate communication for a fraction of the cost of their human equivalents.
Development Development in this arena is taking two directions: 1) Enterprise turnkey solutions intended to be sold as CPE solutions for on-site installation and integration or replacement of your PBX; and 2) Public Network subscription solutions sold through Service Providers (LECs, CLECs, LD & Wireless carriers and independent service providers). This discussion focuses on the Public Network solutions, available to individuals and groups of any size. The first, and most widely publicized speech-activated Virtual Assistant is Wildfire introduced to consumers in 1995. Next was Webley Systems in mid 1997. And most recently, in July of this year, General Magics offering: Portico. To be thorough, it should also be mentioned that many DTMF based pseudo Virtual Assistants (all the telephone switching features without the speech activation or human persona) have also emerged. These DTMF solutions include AccessLine, ESA by StarTouch, Vlink by ILink, Call Sciences Personal Assistant, Prairie Systems Virtual Office and Premiere Technologys Orchestrate, just to mention a few. Most of these DTMF-based manufacturers are traditional PBX-switching solutions with enhanced services options. All these manufacturers have indicated that they are either in the midst of or shortly intend to add Voice Recognition capabilities. In the meantime, Wildfire, Webley and Portico are all available nationwide now through service providers and resellers. All three have a core set of common qualities: 1) a unique voice, identity and personality; 2) a basic set of communications features including: one-number follow, call screening and announce, voice contact file and dialing, voice mail; and 3) a common pricing model: a basic monthly fee plus a per minute charge excluding long distance (Portico suggests its long distance is included but it charges double session minutes when making calls, thus nullifying that claim). Promotional offers aside, the base prices vary from $15 to $45 with per minute charges averaging around 15 cents per minute. Some quick arithmetic suggests that 1,000 minutes of use (roughly 45 minutes per business day), plus the base fees, will result in an average bill of $150 to $200 bill per month for any of these products. In fact, for the serious power users needing all the features, price will most likely not be a differentiating factor for the next few years -- all three companies have similar cost factors in their design.
Design The three manufacturers have used relatively similar design architecture as they attempt to overthrow the traditional PBX thinking of the carriers. All products are based on Pentium Pro servers running Unix operating systems eliminating any use of the costly dumb switch technology. They all use a cluster of telephony servers with separate database servers to serve a series of Voice Recognition Units (VRUs). Those that are web-enabled (Portico and Webley) also have web servers as components of the cluster. For Webley and Portico, Dialogic and Antares cards provide some of the speech recognition, telephony switching and noise cancellation, bridging their calls through the use of the SCSA bus. Wildfire instead chose Natural MicroSystems and its MVIP bus. None of the companies use external PBX switches. As of this writing, all three now also primarily use software-based recognition, though Portico and Webley still use the Antares boards to a limited extent (thus requiring some additional hardware costs). Portico and Webley are using the Nuance6 natural language product. Alternately, Wildfire has developed its own proprietary discrete pattern matching speech recognition engine that is completely software based and thus CPU driven. This distinction theoretically eliminates all DSP requirements and their associated costs. As a result, this may be the only technology difference that an end user will notice since Wildfire users must use specific command phrases while Webley and Portico allow for word-spotting via Nuance6 to capture a wide range of similar statements to accomplish the same command. Wildfire, fervently rebuts this position. Wildfire designers believe that the continuous speech approach is overrated. Instead, by using CPU powered recognizers tuned to provide 93% or better accuracy on the specific command set required for the Virtual Assistant application, their recognizers succeed even in extremely noisy environments such as convertibles or payphones where others might fail. They also suggest that there are truly not that many ways that a user can say Create an appointment, for example, and that giving users a specific command just makes the recognition more accurate and the learning process faster. Wildfire also believes that using the proprietary recognition means their speech recognition costs are solely CPU-driven, without additional licensing or hardware costs paid to outside companies -- thus helping them to drive down the price over the next few years.
Strategy Beyond that engineering distinction the products diverge more on approach and marketing as each aspires to dominate this marketplace. Wildfire, as the first in the category, is a formidable incumbent. Their character voice is well developed, with a friendly, warm yet professional, female voice (recorded demo 800 545-WILD.) The interface is the most personal, as a result of having had a three-year head start in developing and fine tuning the interaction. However, their feature set, while the most robust by telephone access (Wildfire adds on-the-fly conferencing, call-on-call, reminders, voice creation of contacts, and many other features to the basic set shared by all) is not currently available by the web for now. Wildfires move-forward strategy is focused nearly 100% on reducing price which for this sort of product means reducing the feature set. For these sorts of products cost is based almost entirely on the hold times which determine the maximum number of users to load per available port. Wildfire has developed a reduced-set of features currently going to market as Network Wildfire (eliminated is the all important call-on-call, conferencing, groupware messaging features, scheduled reminders, and other minor features). The user can then be introduced to the concept of a virtual assistant at a theoretically lower price and perhaps someday ramp up (albeit unknowingly) to the full-featured model and pricing. By offering classes of service that restrict these features, carriers may be able to heavily load the switches while still introducing their customers to Wildfire (a "lite" version anyway!). They are also hoping that carriers will employ a marketing strategy that will bundle the Wildfire Assistant as a hidden component of wireless or long distance minutes in the name of reducing churn, thus disguising the true cost. Webley elected to use a male British voice and is pursuing the butler motif -- recorded demo (888) 4WE-BLEY. The affected voice of the British butler has been met with mixed reviews. However, they are currently in the midst of creating two new voices male and female, which will be completed before this year. And what Webley lacks in soothing personality, it more than makes up for in features. First off, as their name suggests, Webley has web access to data a critical value for those who are focused on unified messaging or are just more data-centric. The web access is valuable for maintaining user contact information or for updating setup information, and reviewing emails and faxes, which can also be accepted, unlike Wildfire. Webley has also spent a great deal of time focusing on conferencing and inter-connectivity between users. Dialogic has developed a stand-alone server solution that is dedicated to their telephony cards. All ports are bridged on the SCSA bus and theoretically all inbound lines are part of one very large trunk group. The Webley Assistant has been designed to easily support large bridges among its users, regardless of the servers receiving the calls. The user can also monitor the conference activity from a java web applet since all components of Webley are web enabled. General Magics offering of Portico is the Wall Street favorite, possibly because it is the only one of the three that is publicly traded (NASDAQ: GMGC). They have $45 million in spending money to market their service. The other companies, being privately held, do not publish their balance sheets, but both seem well funded by venture capitalists and private placements from Fortune 100 companies. Also, General Magic is several months ahead of Webley in the implementation of Nuance6, upon which they have built a custom grammar and branded it as Magic Talk. They focus their marketing heavily on the free-form speech concept of their continuous speech recognizers. You can hear a recorded demo at (800) POR-TICO Some people find these advanced recognition promises to be unfulfilled: the barge-in occasionally is triggered by accident from a sniffle, throat clear, or car drive by, and the free-form speech is so free-form that new users are sometimes at a loss as to what commands to use -- especially if they cannot visualize where they are located on the desktop paradigm Portico employs. Having said all that, the Portico web interface is, in our view, the best. It has robust contact synchronization between the Virtual Assistant and contact managers such as ACT! and Outlook. It also has a news watch service and a stock quote service similar to the eTrade or Schwab tools. And, except for the missing call-on-call and conferencing (two very big feature gaps when you are a heavy phone user, since Portico user can actually be on the system in session and available, yet miss a call to the voice mail just like the old days!), the most basic phone functions are ready to go like the others. General Magic is also presently the largest and best-funded organization and certainly has wherewithal to correct their acknowledged weaknesses, and a significant budget available to get the word out. And this is perhaps the most important tool at this stage in the industrys developing history. Ultimately, it is getting the word out that will matter most. For all these companies the true battle is for marketing and distribution not for better features or a lower price point. For marketing, the world will need to accept the value proposition that spending $100 or more per month on a Virtual Assistant will improve ones business performance and enhance the bottom line for a fraction of the cost of hiring a human personal assistant. A tough proposition if compared against getting by with $6.95 voice mail. The good news for the speech industry is that this simple killer application is touching thousands of users and their hundreds of thousands of callers every day. Each callers experience further reinforces the paradigm shift towards speaking in plain-English directly to the computer
Joshua Touber is President and founder of Virtuosity established in 1995 as the first national service provider dedicated to Virtual Assistants and associated tools. He can be reached at 323.466.2800 or via email at Josh@Virtuosity.com. For more information, visit Virtuositys web site at www.virtuosity.com.
Sidebar: First Person Dialogue It is the human-like qualities that make this product category unique. In fact, all of the major Virtual Assistant products use the first person when speaking to the user and in fact appear to have a self awareness or an ego. Its not just an IVR navigation tool converted to accept speech commands for selecting menus, but actual human-like characters with human-like responses. Heres an example of how it sounds: Virtual Assistant: "Hello, Im the assistant to Joshua Touber. Please say your full name."
Caller: "Donna Anderson."
VA: "Oh hi. He left a message for you. Would you like to hear it?"
Donna: "Yes."
VA: (playing Joshs voice) "Donna, I had to run out to the store and my cell phone may not be in range, but Ill be there for dinner at 7."
VA: "Please say put my call through or take a message."
Donna: "Put my call through."
VA: "Okay."
Joshs cell phone rings
VA (to Josh): "Hello, Im trying to reach Josh Touber."
Josh: "Its me."
VA: "Donna Anderson is calling"
Josh: "Ill take it."
VA: "Go ahead."
Josh: "Donna, did you get my message?"
Donna: "Yeah youll be here at 7. Just thought Id try to reach you to ask you to grab some dessert."
Josh: "Sure no problem."
VA: "Excuse me, Mrs. Chavez from MCI is calling."
Josh: "Take a message."
Donna: "Who was that?"
Josh: " Oh, just another dang telemarketer anyway, Ill be there in a sec... See you soon."
Donna: "Okay Bye."
Josh "Bye."
So whats the big deal? The big deal is that Donna and I actually talked to each other. Donna called my Virtual Assistant number and it knew where to find me. And it did other valuable tasks at the same time (like save me from an unwanted sales call).