March 1, 2011
By Sara Basson Program Director, Speech Transcription Strategy - IBM Research
The View from AVIOS

Spoken Web for the Developing World

Speech technology has worldwide appeal, but requirements differ dramatically. In the developed world, the technology is a tool to automate services that are currently offered with human intervention. Alternatively, speech recognition can help develop services that have never been offered without human intervention. The focus is either cost savings or new service delivery—with cost saving implicit even in “new service delivery,” since the services are proposed without hiring humans to mediate.

In the developing world, especially India, the value propositions change. Labor is not seen as an excessive cost that must be removed from the equation. On the contrary, the goal is to create jobs. Services that could be automated are strategically not automated in order to provide employment opportunities.

Other challenges exist in designing speech services for bottom-of-the-pyramid populations, who often are unfamiliar with automated services. In one pilot deployment, IBM India lab designers said users got confused by prompts and other prerecorded information, not knowing they had to wait their turn to speak. One solution was to interleave the prompt portion with music to make it clear that the caller was not speaking to a representative.

Another observation from the lab designers was users’ preference for directed dialogue over mixed initiative conversation. The mixed initiative, “How may I help you?” style appealed to savvy users who could request what they wanted without traversing several menus. For rural novice users, however, open queries were confusing, and the callers didn’t have a good mental model of what they could request. As such, structured directed dialogue was preferred, albeit it prolonged the interaction. These users were also cost-sensitive and conservative in their use of telecom minutes, meaning it was a balancing act to fully inform the users while creating to-the-point prompts.

India has more than 20 official languages and multiple local dialects, complicating generic system design. And speech-recognition models don’t exist for all of the languages required.

In recent months, I have come to better appreciate the global promise and challenges of speech automated systems because I am working with IBM Research India on a project called Spoken Web. It was conceived to address the needs and wants of populations at the bottom of the pyramid.

Worldwide, 800 million people are illiterate, according to 2010 figures from the UNESCO Institute for Statistics. According to 2008 World Bank figures, at least 80 percent of the world’s population lives on less than $10 a day. Because electricity is not predictably available, Internet access is limited. People can’t afford computers and couldn’t read the information posted even if computers were made available.

In the past decade, though, mobile phone use has risen sharply. In India, 483 million people are active subscribers, the Telecom Regulatory Authority of India says. Analysts project a deeper penetration in the coming years, as prices dip further. The phones owned by the bottom-of-the-pyramid users are of the simple variety; smartphone penetration lags.

Spoken Web was designed to provide the benefits of the Information Age to bottom-of-the-pyramid populations, using speech technology and simple cell phones. For rural populations, designers must determine the services that these rural populations want. Clearly voice-based access to posh department stores would not rank high. Most of the information of interest is local—shop owners, weather, and emergency information. And the local topics would not be available even if these users had full access to the Web.

Therefore, Spoken Web was designed to enable rural users with no technical expertise to create their own web of information using a simple phone. Local entrepreneurs create “voice sites” (analogous to Web sites) that can be accessed and hyperlinked to other voice sites. IBM Research India has conducted promising pilots predicated on Spoken Web. Farmers have been able to get information about pesticides, weather conditions, or the fair market price for their goods. The potential impact is enormous.

User-centric design for bottom-of-the-pyramid populations demands rethinking current models for information provisioning in the developed world. Entrepreneurs must be able to post information. Users need an access mechanism leveraging available tools and technologies. In the developed world, speech application providers often muse about whether speech is the interface of choice for a particular application. But in the developing world, speech is a critical modality for pervasive information access. We anticipate that Spoken Web and services like it will pave the way.

Sara Basson, Ph.D., is program director of speech transcription strategy at IBM Research. She can be reached at sbasson@us.ibm.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Spoken Web for the Developing World

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

Northeastern Researchers Develop AI App to Help Speech-Impaired

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents