October 1, 2002
Q & A

Piyush Modi, Vice President of Media Processing Technologies, IP Unity

Q What is the IP Unity mission?
A Our mission is to provide carriers with tools to generate new revenue streams through enhanced services. We are changing the way people communicate with each other by bringing the telephone network and Internet together, making voice, data and video work together. We are also changing the way people interact with machines by providing natural, intuitive, multimodal interfaces. Finally we provide our team with a great place to work and a chance to make a difference to the world of communication, and we aim to provide our shareholders with exceptional value.

Q What vertical markets are the strongest drivers/implementers of the services speech technology has to offer?
A There are three main markets -- embedded, enterprise and carrier. In the embedded market, speech technology is now in PDAs, cell phones and also in business devices, like the stocky PDAs used for stocking merchandise. In the enterprise market, businesses of all sizes use speech technology platforms to help with the workflow and to automate tasks. Carriers can take enterprise-type functions and offer the speech resources to multiple enterprises, eventually extending enhanced services to individual consumers.

IP Unity is strongest in the carrier market, but all three markets need to integrate seamlessly at some level. Imagine being lost while driving, and getting directions on your cell phone or car navigation system. Simple voice manipulation can occur within your navigation tool or handset, but if you were in a hand-busy-eye-busy situation, you might opt to have the directions read to you. The ASR and text-to-speech engines required to do this may well reside at the carrier level. To the user it doesn't matter where the voice processing is taking place, it just has to deliver a seamless service. In fact, speech is just one of the modalities at play here - our lost driver might prefer to click a map on the navigation tool to enlarge it. The perfect natural interface is one that allows the user to get information in any way they want.

Q Why should service providers and carriers deploy speech solutions?
A The value proposition of speech solutions is two-fold. It automates the services and allows users to interact and access information and people in the most natural manner. The first part is about reducing operational costs and the second is about materializing revenue growth by delivering innovative communication services.

As the current voice based communication over the telephone channel migrates to a multimedia communication, the challenge for carriers is to provide user interfaces that evolve from the one provided by the current black phone to the one that will be provided by the future endpoints. The challenge is to seamlessly integrate text, sound, image and video information in a single communication channel. To facilitate interactions with such multimodal information bases using small devices it is extremely important to voice-enable the endpoints. These voice-enabled endpoints will allow consumers to access managed services and people of their choice using the natural language personalized speech and/or text interfaces. Such sticky services will not only deliver a value proposition to the customers but also build loyalty between the consumers and the carriers.

Recent innovation in infrastructure using decomposed IP based media server architecture makes it possible for carriers to integrate these expensive technology resources in their network incrementally. It allows them to buy various function elements from multiple vendors and then share these expensive assets across multiple services. Thus, it is the perfect time to deploy the speech-enabled services across the large network to realize both the operational savings and top line revenue growth.

Q Can you provide real examples?
A Sure. In terms of technology implementation, look at the evolution of automated operator services over the past ten years. The technology resulted in billions of dollars saved by carriers, and people are completely comfortable interacting with the machine at the other end for routing their calls in the network to accessing directory assistance.

Speech allows people to naturally communicate in order to reach their end goal. It's kind of like going back to the days when we had human switching operators. The operators used to know the callers and the callers often knew the operator. Based on their knowledge of the caller, the operator was able to give the person customized service, and we're trying to go back to that sort of natural intuitive interface using computers.

This is culminating in very natural interface, instead of just entering a calling card number by voice, the carrier's computer will be more intuitive and ask something broad like "How may I help you?" and be able to interact with you.

I like to think in terms of an "enhanced services technology curve" to project when these services will be unleashed, and I think that by the end of 2003 people will be using these services in a routine manner without even knowing it.

Q What do you hear from your customers concerning speech technology?
A When big telcos build out their networks, they're mostly starting with legacy equipment. They want two things: to grow the top line, and to run their operations more effectively. Speech technology ties are essential to both of these aims. New services using speech provides carriers with the growth they need. The new media server-based architecture enables the incremental deployment of expensive speech resources, and also their usage across multiple services. Carriers have started to appreciate this value proposition and have started to ask for integration of speech resources in a network agnostic manner behind the media server-based decomposed functional element architecture.

Specifically, almost all the major carriers have issued RFPs for the deployment of speech resources using media server-based architecture and we anticipate that by the end of this year and the beginning of next year we should see full-scale deployments. We're gaining very good traction and evolving our platform to ultimately deliver the seamlessly integrated text, voice, audio, image and video-enabled media processing services from carrier's network to the intelligent endpoints of the future.

Q How does your platform change deployment of speech technology solutions?
A Two ways. Firstly the media server encompasses everything that is good about VoIP and the decomposed network architecture. The media server is also an interesting beast in that it doesn't have to reside within an IP system, it can just as well sit in a TDM system. This allows for migration.

The second point is that speech deployments are currently very cumbersome. You need to learn about speech technology, and fiddle with network adjustments and instrumentation. Once it is up and running, systems need tuning and baking. The basic problem is that the platforms in use today were not designed for speech. IP Unity designed a standards based platform from the ground up, with a speech bus and ATM or IP based speech resources integrated into the platform using ASR middleware. It is tightly coupled. Our middleware is speech engine agnostic, so while we currently use SpeechWorks, we can use any speech engine. And VXML allows the expensive voice resources to be shared across the network.

Q Where will this technology be in three to five years? What issues will it be helping customers solve?
A Convergence is happening in a big way. We now look beyond the traditional telcos to provide converged communication services--it's anybody's game, now. Wireless carriers, cable operators, service bureaus - unique services are coming from all angles. Consumers will get the devices of their choices and interact with the media-integrated network in whatever capacity, and in whatever modality they choose.

The interfaces on devices are becoming more complicated, and there is clearly not enough space on every device to add a keyboard to handle all of the functionality. That's where the implementation of speech technology is guaranteed. It enables access to the information in the most natural and economical manner. Speech functionality will become part of every endpoint. By the end of 2003, most people will be using speech-related services at least once a day. It is not unrealistic to expect that in the very near future people will count on their voice-enabled intelligent agent residing on a carrier's network to coordinate and customize their communications with people and information.

Q What should the speech technology industry as a whole be doing to increase the growth rate of speech technology deployments?
A On the technology front, algorithms need to become smarter, more efficient and more robust. Vendors also need to realize that speech is but one of the several modalities, and its evolution will greatly depend on how well it blends in with the other means of communicating.

It is hard for vendors to cast their technology in a product for every market. If you have a complete portfolio and a technology that scales all the way from embedded devices to carrier scale, the challenge lies in productizing the technology. The computing budget, and environmental requirements for speech interfaces varies greatly depending upon whether it is running on a embedded device, enterprise network or a carrier's network.

However, the vendors can help by packaging this technology using the same set of programming interfaces and APIs. There has been a lot of help from various standard bodies such as the W3C, IETF and ETSI. It is necessary for vendors to adhere to these standards and to actively evolve them to help develop the market.

Q Who are some of your partners in providing speech technology and why did you choose those companies?
A We keep in touch with all speech companies, and have very positive relationships with many of them. Right now, we're working very closely with SpeechWorks. Our companies share a philosophy and vision. They are really good about keeping up with trends, they contribute to forums, and have the ability to meet our customers globally.

We have embedded the SpeechWorks OSR engine into the IP Unity Media Server blade to deliver a highly scalable and economical solution. We also support AT&T Text-to-Speech engine as the software-based resource and are working closely with them to embed it inside IP Unity Harmony6000™ Media Server.

We maintain close contact with Nuance, and future releases of our media server will integrate their technology as well.

Q Tell us a little about your company and where you see yourself in a couple of years.
A The media server market is evolving very rapidly, the market based on current generation media servers enabling voice-specific media processing will be commoditized in two years or less and by that point we will be the platform of choice for the industry and for other people's applications. A journalist likened us to Sun in this regard, and we like the analogy. We have already started our work on our next generation media processing platform that leverages the most recent innovations in the component and media technologies to deliver the cost effective multimedia processing platform that will enable carriers to deliver a seamlessly integrated text, sound, image and video information into a single communication channel while maintaining the carrier grade quality and ease-of-use. There is a new ecosystem of companies providing IP based enhanced services to carriers, cable companies, service bureaus, call centers and even the enterprise market, and we want to work with all of these companies to be part of the revolution.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Piyush Modi, Vice President of Media Processing Technologies, IP Unity

Ethical Implications of Voice Generation

Driving Speech Technology Trends with AI

More Web Events

Mood Media Launches Messaging Copilot

Sentiment Analysis Moves into Voice Interactions

Deepdub Partners with AWS

StudyFetch Launches Conversational TutorMe App