November 1, 2007
By Tom Houwing Director - voiceandvision B.V.
Talking Tech

This Is Your Wake-Up Call

About eight years ago the speech industry came out of its initial phase, sort of, focusing on getting industrial. Expectations were high and analysts predicted a golden future, with an exponential increase of use and adoption of speech recognition-based applications in daily life. Speech was bound to become the next revolutionary development after the Internet.

With the technical development of the Internet and all kinds of software that enable us to write, calculate, present, and draw the most impressive professional documents, we entered the epoch of modern automation. Improved ways to create infrastructures that realized integration of different back-end systems with databases and call-center desktops contributed as well.

But, the information and communication technology (ICT) community soon discovered proprietary approaches to putting together new systems led to isolated solutions; a more standardized, reusable modular approach was needed to bring overall development up to speed. An illustrative example is the development of Web sites. At first, special skill sets were required; now preconfigured building blocks guide the way to develop quite acceptable Web sites, requiring no technical knowledge.

Similarly, new automation promised solid potential. Naturally, speech recognition was adopted within the ICT community, with its own mindset and skill set in which technical feasibility overshadowed usability. At first, with rather old IVR systems that basically allowed one-word-recognition, we now could say one instead of keying it in, or create verbatim prompts like If this information is correct, say continue. Solidly developed systems showed good results as far as speech recognition was concerned, but they still did not attract end users on a larger scale.

Pretty soon an analogy between the Internet and speech was born. Many who knew, experienced, or developed ICT technology now pointed out that speech needed standardization and a modular approach as well. What worked for Web sites had to work for voice as well. A procession of new companies with true faith in the future of speech marched forward into developing speech blocks and preconfigurable voice modules next to voice application management systems and more.

Following the expected trend, early adopters invested extensivelyin this new voice software and middleware. With their new voice platforms ready to take on the world, most of these companies got stuck, as professional skill sets failed to lead to voice automation projects, but also to design and produce an efficient, user-centric VUI, required to turn a man-machine interaction into a positive user experience. ICT-focused companies invested a lot in what they already knew and shied away from anything they didn’t.

More or less lost in a jungle of new components required to make speech recognition work, a lot of companies were unable to conduct voice projects in an efficient and cost-effective way. Therefore, speech recognition-based applications were not ready to revolutionize the world of automation, and the expected explosion did not take place.

Facts Today
Intelligent back-end integration automates and optimizes all kinds of processes. For example, many utility companies today have fully automated billing. Other systems allow us to take care of our banking from behind our computer 24 hours a day.

Modern automation seems to work well, and with an already developed infrastructure and back-end integration, additional speech automation can enhance already existing services. Web sites will become voice-enabled. In some cases business decision-makers want speech recognition to take the place of call center agents and even deliberately block the way to an operator to save costs. Some believe speech applications provide an increased service level.

To make speech applications work, we need to respect a few basic facts:

The mindset of people who are going to read differs from those who are going to talk.
Written language differs from spoken language.
When it comes to digesting information, we are raised with graphical user interfaces, like books, pictures, and film.
Information that comes at us as audio hits the short-term memory first and only sticks when repeated.
Purely audio information stimulates our fantasy.
Human verbal communication is based on an inherited language instinct that goes way back.

Working with these facts, which create a framework pointing out possibilities and limitations, requires professional attention. Efficiently meeting the mindset of callers within an auditory environment calls for communication expertise. Writing spoken language requires professional training and experience.

There are many ways to verbally explain information and yet only a few that suit an effective VUI. Prompts need to be self-explaatory short, and to the point to support the man-machine dialogue in the best way possible.

At the same time, most descriptions of business concepts often are stuffed with internal jargon that must be reworded to become understandable for end customers. Composing prompts that are in tune with callers’ mindsets requires user and call analysis. Understanding business processes and finding the right way to mold these processes into an efficient VUI that generates caller acceptance requires professional requirements-gathering and analysis. To achieve a return on investment, the public has to use these speech applications; to make sure they are going to be used, the VUI specialist needs to focus on a realistic and user-centric design that generates caller acceptance and a positive user experience.

Although they are in a very decisive role, VUI specialists are rarely in the lead of speech automation projects and almost never a part of the initial team to examine the possibilities of automation through speech recognition-based systems. Most of the time, the VUI designer is called when a project already has started and processes already have been defined. Too often professional VUI design is being reduced to ensuring that the intended functionality somehow fits within the predefined system structure.

Project leaders and those in charge of making decisions, especially during the requirements-gathering and analysis phase, might not be able to grasp VUI-related issues like an efficient dialogue structure and well-composed prompts. These considerations simply do not fit their skill sets. VUI design processes are not yet standardized, established processes applicable to various solutions. Today too many speech applications are designed without taking the VUI into serious and professional consideration. Therefore, they fail to meet caller expectations.

Mainstream Technology
One of the major reasons why speech applications aren’t mainstream technology is because of poor recognition. Callers can’t get through, resulting in low first-call resolution. This is not because speech recognition doesn’t work. It’s because the entire process to deploy a speech application isn’t really understood. Too many voice projects are submitted to a menu structure, which is basically marketing-driven and created out of internal company processes that have no direct relation with the end user.

In some projects, tuning takes place while the system is going into production. This most definitely results in solid recognition. However, by the time an acceptable performance is established, the process has scared away most potential end users.

Other projects are carried out just because companies invested in expensive voice platforms and need to proceed with an outdated automation plan to meet artificial deadlines against which their personal performance is being measured.

And yet there are others who read some articles. Now convinced that they know something about the market, they pull together some directionless benchmarking data, regardless of technical possibilities and limitations. They might even ask for an open-ended prompt like How may I help you?, which has nothing to do with a professional analysis of the speech application in the first place.

In fact, we aren’t even close to being mainstream. Face it: Most of us do not use speech applications in our daily lives. For this to change, companies need to understand that VUI-related issues, especially during the setup of a voice project, are a key point in regard to caller acceptance and positive user experience. Project leaders and decision makers need to realize that VUIs have their own requirements and call for professionalism.

Some vendors and solutions providers are looking for possibilities with speech for the day after tomorrow, which is fine as long as we stay focused on solving yesterday’s problems as well. If we don’t, we’ll find ourselves building the future of speech on suspicious grounds.

…And Tomorrow?
And still, in spite of huge potential and great user-friendly possibilities, too many voice applications are not convincing. We might have lost sight of those who are going to use these new systems and become entangled in hierarchical structures, internal processes, administration, and overzealous communication. In time, a truly prosperous future of speech automation is getting constructively destroyed from within, while future decision-makers shake their heads while evaluating the results of their colleagues.

So this is our wake-up call. We need to respect voice projects requiring specific project setups and professional skill sets to meet the various disciplines to bring speech automation to success and maybe one day turn it into a mainstream technology.

Tom Houwing is director of voiceandvision, a VUI design firm based in the Netherlands. He can be reached at tom@vui-experts.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

This Is Your Wake-Up Call

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

XL8 Delivers Real-Time Spanish Translation Captions to U.S. Public Broadcasters

Northeastern Researchers Develop AI App to Help Speech-Impaired