Voices in Harmony

Speech technology is rapidly altering the way enterprises communicate with their customers. And while technology adoption thus far has been gradual, the pace of change is beginning to quicken. Where is speech technology heading in the next five years? The answers are complex, based on variables such as technology adoption by enterprises, the pace of consumer acceptance and organizational lag within enterprises. But the promise of speech is very clear. Let’s picture a scenario five years in the future. A caller picks up the phone, and instead of a humming dial tone receives a natural language voice tone asking: “What would you like to do today, Jim?” Jim requests the United Airlines voice site, again using natural language: “I’d like to go to United Airlines.” On recognition of his request, Jim is presented with the United Airlines site, where he checks the status of his flight for that afternoon using speech. Once he is done, he issues the voice command, ”Go Home,” which takes him back to where he started. He now decides to visit the Fidelity Investment site to check his portfolio, again using speech commands. Finding a significant change in the value of one of his stocks, he decides to talk to his broker and issues the voice command, “Call Carl Smith”. When that conversation is finished, he again says, “Go Home.” Jim has thus far checked his flight status and investment portfolio, and talked to his broker, all in the course of one continuous phone call, and all using speech. The unanswered question is how we get from where we are today to this world of easy, intelligent, speech-driven communication and convenient self-service. The Voice Browser Alternative A new Voice Browser model is evolving as a result of the convergence of customer service solutions across multiple touch points, enabled by the emergence of standards such as VoiceXML and SALT. Voice Browser service providers offer a robust managed speech platform, designed to be shared by channels, partners and enterprise clients. Speech recognition solutions that use standards-based technology can be tested, deployed and managed on the shared platform, benefiting from best-of-breed management, monitoring and reporting tools. The Voice Browser hosting model allows enterprises to leverage their existing e-business infrastructure, enabling VoiceXML applications that are rendered in a network-based voice browser environment. The application and content, which reside within the enterprise infrastructure, generate VoiceXML which is rendered to the Voice Browser service provider’s environment using the standard Web paradigm. The most likely providers of Voice Browser service are the current turnkey hosted service providers and carriers. Both have already invested in building a platform designed to be shared and have the experience and infrastructure to deliver carrier-grade service level agreements. The buyer values of enterprises that opt for the Voice Browser model include reduced capital expenditures, better utilization of existing IT resources, reduced cost of operations, and, importantly, the ability to scale rapidly on demand. Multiplying Choices in Content and Applications As acceptance of speech accelerates, service providers for both content and applications will proliferate. Content Providers offer content services such as news, weather, sports, driving directions, stock quotes, and so on. Application Providers create speech solutions or modules, such as order entry, order status, 401k fund management, voice activated email, dealer locator and so on. Enterprises will be able to easily choose and configure content and applications to offer richer and more effective speech services, more quickly and at a lower cost. These services will be offered on an outsourcing basis, or perhaps through licensing arrangements, and will be available under any of the speech application platform models already discussed – as long as standards-based technology is employed, such as VoiceXML and, in the future, SALT. The key evolution here is the proliferation of alternatives, allowing an enterprise to deliver more varied speech services to their customers more rapidly. Enterprises will be able to build portions of their applications, integrate them with applications from third-party providers, and add independently created content to deliver comprehensive speech solutions in a timely manner. The Sound of the Future: Collaboration and Interoperability In the near term we will see more and more speech applications being enabled on all platforms. These applications will be created by enterprises using internal resources, external professional services organizations and application service providers. And they will incorporate independently provided content. However, this growing speech landscape still consists entirely of independent speech applications, accessible one at a time, each of which provides a well-defined set of services. In any one call, a person is limited to the applications and content provided by each enterprise. Each application, if you will, is isolated in its own soundproof booth. The next step in the evolution of speech is to begin forging links between these isolated applications – providing more flexibility so that callers can obtain more value during the course of a single phone call. The concepts that will begin this evolution are collaboration and interoperability. Collaboration enables a caller in an enterprise application to request content from another enterprise using the voice browser – a smart VoiceXML interpreter. The voice browser recognizes the request and loads the appropriate content from within the enterprise site using VoiceXML. For example, a caller who had checked current flight status information from United Airlines could then request car rental information from Hertz. The speech platform would understand the request and enable the corresponding content. This would be possible because United and Hertz had collaborated to share each other’s content. This convenient sharing of applications and content between enterprises is collaboration. Other enterprises might want to share their content and applications as well, but prefer to use their own internal platform – or some other third-party platform – to enable them. This could be due to enhanced platform requirements or significant security constraints. For an example, let’s assume a voice portal enabled by a wireless provider received the original call. When the caller requests “Investments” to check her portfolio, the call is transferred to the Fidelity site where the customer is served. Once she completes her transactions with Fidelity, she can return to the voice portal via the use of a hot word like “Go Home,” and from there access other content. This is known as Interoperability. Interoperability enables a caller to move from one voice site on one platform to another site on a different platform – and back – during the course of a single phone call using speech commands for navigation. This ability to move from one platform to another during the course of a single call while maintaining a consistent user experience distinguishes interoperability from collaboration. Interoperability depends greatly on the adoption of Voice Over IP (VoIP) and Session Initiation Protocol (SIP). The importance of a network that enables voice and data services in a unified manner is critical to its implementation. Collaboration and Interoperability will link today’s independent voice applications, enabling value chains that span multiple organizations. Customers or callers will be able to share complementary content during the course of a single phone call, and enterprises will be able to reach out to a broader set of customers. What’s next? The Speech Web The Speech Web is the unification of the telephone network and the Internet, enabling a caller to access diverse, disparately located content and applications via a user-friendly voice browser, all during the course of a single call. The key components of the Speech Web will include a voice browser, a speech activated directory service, and standard protocol to enable Interoperability (efforts are currently underway at the W3C). In addition, the emergence of the Voice Browser hosting model is key, as the demand for Speech Web service origination providers will be paramount. The Speech Web will be like the Web, where any content will be accessible via any compatible Voice Browser. With the evolution of the Speech Web, Jim (the friendly caller who began our odyssey) will truly be able to enjoy the benefits of ubiquitous access to virtually unlimited content via speech. And the enterprise of the future will benefit exponentially from the realization of a highly collaborative ecosystem that will enable them to deliver and share feature-rich services in the Internet sprit. Vishal Dhawan is the vice president of technology - Speech Solutions Line of Business for iBasis.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Voices in Harmony

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

Northeastern Researchers Develop AI App to Help Speech-Impaired

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents