How Speech Makes OXYGEN
Will speech recognition be the "Oxygen" of the next generation of computing? Researchers at the Massachusetts Institute of Technology envision a world where computing devices will be as ubiquitous as oxygen. Scientists at the MIT Laboratory for Computer Science (LCS) are working on an ambitious program that could reinvent the way we use information technology. The Oxygen program was announced at the 35th anniversary of the MIT Laboratory for Computer Science, an occasion that was also highlighted by a keynote speech on "The Future of Software" from Bill Gates, president and CEO of Microsoft. Gates was also on hand to announce a donation of $20 million by his foundation for the construction of a building that will be the new home for the LCS. The William H. Gates Building, as it will be known, is expected to be completed in 2003. "The MIT Laboratory for Computer Science is one of the most important centers for computer research in the world. Im happy that this gift will be used to support continued innovation in computer science and the groundbreaking research that the LCS is known for," said Gates. Gates also commented on the future of the Internet, and put forth a vision of the future of computing in which speech will play a vital part. While acknowledging that developing speech recognition has grown to be a much more difficult problem than originally expected, he felt certain that in "the next decade we will see great progress in (developing software that) mimics the senses." Most of his comments centered on the explosive growth and even greater potential of the Internet, which he predicted would become a more common method of transacting business than the telephone within five years. "We must have speech recognition," Gates said, "to make these Internet systems work." He specifically cited the work of Dr. Victor Zue, director of the Spoken Language Group at the center, in making speech recognition part of the user interface.
Speech and Oxygen Speech recognition is also vital to the future of the Oxygen project, which was unveiled at the anniversary celebration. Oxygen represents a much different vision of what computing can be, and it is very hard to imagine how it could work without speech technology. Right now, Oxygen is a five-year, $40 million research project pursued by the LCS and Artificial Intelligence Laboratories at MIT and has an "overarching purpose to let people do more by doing less," according to Michael L. Dertouzos, director of the LCS. The program is funded by the Defense Advanced Research Projects Agency (DARPA.) The Oxygen concept depends on eight hardware-software technologies. The four core technologies are:
- Handy21, a portable device that Oxygen users would carry.
- Enviro21, a system in an office wall, car trunk, or home basement.
- N21network to link all Oxygen devices together. Spoken dialog software.
There are also four "user technologies" involved in Oxygen.
- Knowledge-access technology.
- Automation technology to "offload from your brain."
- Collaboration technology.
- Customization technology.
Handy 21 is envisioned as a portable, universal device that will look like a cell phone, but will also have a small screen, a camera, a Global Positional System (GPS) module, an infrared detector and a powerful computer. Almost everything in the device will be software controllable. It could be viewed as a communications chameleon because the unit can change from a cell phone to a two-way radio, a TV, a beeper, a handheld computer, a pointing device and more. The Enviro21 bears the same relationship to the Handy21 that power sockets do to batteries. They will mimic the Handy21, but have great storage capacity, processing power and communication speed. Many environments will also be connected to sensors and actuators, and can raise the room temperature, operate a fax machine and tell if the door is open. The N21 network links all the Oxygen devices to each other and to the worlds networks, and thus creates secure regions within which users can work and communicate. Spoken dialog software is built deep into Oxygen, and is necessary to make Oxygen natural and easy to use. Dr. Zue, MITs chief speech expert, told the attendees at the 35th anniversary celebration that speech is the "ultimate interface," and allows us to make our computers behave more like humans. He told the group that conversational interfaces are emerging that recognize and understand speech (in narrow domains) much like a person would. The goal, he said was for multi lingual, real time communication using standard PCs. Three projects from Dr. Zues group demonstrate speech recognition within specific domains: Jupiter, which provides weather information; Pegasus, which provides flight status and Voyager, which provides traffic reports. Besides the four core technologies, MIT is also developing four user technologies, which are important to make the Oxygen concept a reality.
- Knowledge-access technology lets users find the information they need, in their own way, among their own data, including the Web. Automation technology allows users to handle certain routine work by using computers.
- Collaboration technology helps a group work together by tracing the discussion and keeping an accessible trail of issues, documents and conversation fragments.
- Customization technology adapts Oxygen to the needs of individual users and ensures that all software is downloaded automatically to all devices when new versions have become available, errors have been detected or users have asked for new capabilities.
Oxygen represents a new concept in computing and could, if successful, mean a radical departure from todays desktop environment. The goal ultimately is to make machines more like humans. According to Dr. Dertouzos, "by capturing the human utility of new technology, Oxygen should encourage application developers and users to bend machines toward human needs, rather than the other way around."
Brian Lewis is the editor of Speech Technology magazine.