Industry Leader Focus
Editor's Note: This Industry Leader Focus article is part of an ongoing series of stories in which speech recognition industry leaders present their views about the state of the industry and what lies ahead. The much-anticipated surge by speech recognition providers is, depending on who is speaking, either here or just around the corner. The burgeoning popularity of the Internet, coupled with mobile connectivity devices and voice-driven electronics, has renewed interest in speech recognition, propelling its associated companies to the forefront of the technological explosion. Speech is moving into the mainstream of computer technology, and that acceptance places a greater demand on hardware and software components. The speech recognition engines are the glamour applications of the industry, but the acoustical input devices, headsets and microphones play a huge role in the effectiveness of the technology. If the sound isn't clearly transmitted to the engine, without the background noise, then an accurate translation simply isn't possible. Two of the companies that have been at the forefront of acoustical provider solutions are Andrea Electronics and Emkay Innovative. Andrea's patented Digital Super Directional Array (DSDA¨) far-field microphone technology and other software enhancement algorithms, including PureAudioª, SuperBeamª and EchoStopª, cancel background noise and transmit a clear voice signal, even when the speaker is at a distance from the microphone source, enabling a hands-free, enhanced PC communicating experience. The company's digital technologies have and can be optimized for various mobile applications and products including mobile phones, personal digital assistants (PDAs), laptops, Internet appliances, global positioning systems (GPS), telematics systems, Auto PCs and wearable PCs, among others. Strategic partners include IBM Corporation, Intel Corporation, Donnelly Corporation, Symbol Technologies, Inc., Clarion Corporation of America, Microsoft Corporation, DeltaThree.com, Lotus Development Corporation, fonix Corporation, HearMe, Centra Software, Inc., Logitech, Inc., Net2Phone, Inc., Voyetra Turtle Beach, Inc., and ZeroPlus.com, among others. Additionally, Andrea's Active Noise Cancellation (ANC) and Noise Cancelling (NC) technology, the first microphone technology certified by the Windows Hardware Quality Labs for use with Windows¨ 95, 98 and NT operating systems, has met the requirements to be certified as PC97, PC98 and PC99 compatible and has been approved for use with NetMeeting 2.0 and 2.1 and Internet Explorer 4.0. Douglas J. Andrea, former co- president and a director of the company since 1987 was named co-chairman and co-chief executive officer with John N. Andrea. Mr. Andrea is responsible for new product and technology development and production and technology integration, which includes the company's patented microphone and earphone technologies as well as overseeing business development concerning the company's DSDA¨ technology. Mr. Andrea oversees the intellectual property (IP) process with outside IP counsel. Douglas Andrea has a Bachelor of Science degree in Industrial Design from Syracuse University and is a member of the Board of Directors. Emkay was introduced by Knowles Electronics approximately four years ago as the New Business Development group, or NBD. The charter of this new business unit was to take advantage of the Knowles expertise in transducers and acoustics and expand into other markets. This expansion brought many new opportunities for the Knowles product line - from patient signaling devices in cardiac pacemakers and defibrillators to electronic exhaust systems, which reduce low frequency exhaust noise while increasing horsepower. Emkay's experience and expertise is drawn from over 50 years of experience in the Knowles family of companies. It has at its disposal the financial strengths and resources from Knowles companies worldwide. Emkay products are made with high quality micro-transducer technology and acoustics expertise. They provide high-performance solutions for a variety of applications and industries including telecommunications, military, medical, automotive, computers, industrial and civil communications and multimedia. Emkay offers not only a component but a finished consumer product as well. David Ross is the Director of Sales and Marketing for Emkay Innovative Products, a Knowles Electronics LLC company, 2800 West Golf Road, Rolling Meadows IL 60008. He can be reached at 847-952-3972 or
david.ross@emkayproducts.com
. David's background includes working for General Motors, Hitachi and Casio in various engineering and sales positions. He has a Bachelor of Science degree in Electrical Engineering and a Master's of Business Administration degree from the University of Michigan.
sT The catchphrase seems to once again be an impending explosion in the speech industry. Where will that explosion be, and when? Doug Andrea, Andrea Electronics: It's about to happen. The killer app, if we can call it that, will be in communicating with handheld devices. In line with that, Andrea is focusing on enhancing a new technology called distributed speech recognition, which is rapidly becoming a new wireless protocol. The key to that is "feature extraction," which extracts the features of speech which are most desirable in order to utilize them over a wireless speech recognition network. Essentially, it creates a compressed/ decompressed form of speech transmission that a machine can understand, which is totally different from the more common method of telephony conversion. If you look at sales of speech dictation software right now, you see that they are starting to level off. I don't think this type of desktop dictation is going to see tremendous growth. The real growth is going to be in command and control applications. Our goal is to partner with hardware and software OEMs that are developing Internet-enabled wireless devices and speech-enabled software to embed our audio input software technology as well as apply our Microacousticsª - microphone placement techniques. These clients will be able to tap into the Internet, and they will be using speech as the interface. This distributed speech recognition format is the mode by which they will tap into this system, which I think will be the focus in the near term in talking to wireless clients.
David Ross, Emkay Innovative: The speech industry may see a period of rapid growth rather than an explosion, but what is driving the growth are improvements in both hardware and software, resulting in growth opportunities in traditional markets and new applications/markets. The growth will come from VR applications, automotive, command and control and telephony. Specifically, those areas where speech inputs increase the functionality of the task, improve ease of use or allow capability not previously available. For example, controlling equipment in an operating room, Internet browsing with voice only, interactive customer service to increase call volume and automotive command and control with telephony.
sT With all the recent emphasis on voice-enabled Internet connectivity and embedded systems in consumer electronics, what is the direction for companies like Andrea and Emkay? Andrea: We're moving in the direction of mobile connectivity and our portfolio of technologies, running on DSP and USB platforms, have positioned us well to move into that field. Our growing software portfolio and acoustic hardware are both embeddable and scalable. You saw a lot of people come into the headset market because that was easy to do. The ability to offer a software-based audio input solution will be important for these new devices, but is not nearly as easy to create. Our engineers and scientists have spent their entire careers perfecting the knowledge to implement digital signal processing techniques as a means to achieve new levels of noise cancellation and voice preservation. This is not an easy task to duplicate. These advancements have allowed people to communicate without the need for a headset or handheld microphone, giving them true mobility and freedom.
Ross: We are involved in all areas; our business model is an Acoustic Solutions Provider. We provide the components as well as sub-assemblies/assemblies to companies looking to optimize the voice input signal. Since we manufacture the key acoustic component, the microphone, we can optimize and guarantee the quality of our voice input solutions, such as headsets, far-field microphones, embedded systems and digital interfaces (USB Adapters).
sT How do you perceive the market for acoustic providers and noise cancellation software developers at this time? Is there more competition or less in the acoustic provider market? Andrea: Again, we don't believe the future of speech recognition will be in headsets, which is where we have seen most of our competition to date. Most anyone in the headset business can make a noise-cancellation headset that is adequate for a quiet office PC application. The real future for microphone developers is with software-based solutions that are adaptive and customizable. With this, a highly skilled engineering capability is necessary. We have not found that there are a great deal of other companies offering truly unique microphone software technologies or who have the level of specific engineering capabilities that we have. As a result, we feel there is a lot more opportunity with a software and microphone embedding licensing model for handheld mobile devices, telematics systems, etc.
Ross: The market for acoustic providers, specifically transducers, is growing rapidly as new, smaller communication devices, PDAs and computer devices are brought to market and new embedded applications emerge. The goal of these applications is to improve functionality without increasing complexity. Voice is a natural input method to achieve this goal. This reasoning implies acoustic providers will also experience growth, along with noise- cancelling software developers, but we feel this will be embedded software rather than the more typical retail shrink-wrapped package. With many emerging/growing markets, competition naturally increases. Those companies with a solid technology base such as ours will ultimately succeed.
sT What advances do you see in the hardware end of acoustic equipment? Andrea: On the hardware side, in addition to our noise cancellation and patented active noise cancellation acoustic microphone techniques, Andrea has been focused on developing its proprietary embedded Microacousticª technology. This technology, which I've referred to several times in this discussion, is a proprietary acoustic technique, designed to achieve noise cancellation in a handheld device. In conjunction with this, we have a wide range of software algorithms, including noise subtraction, adaptive beamforming, beam steering and echo cancellation. Today, audio input solutions providers need to meet several stringent customer requirements, which can vary from customer to customer, some of which include a need to achieve differing levels of noise cancellation without corrupting the voice signal, meeting product design and processing requirements and offering these solutions through a highly cost-effective means. Clearly, large-scale demand is anticipated to come from companies developing next-generation mobile communications devices. The audio input solutions required for these environments are a lot more complicated than in a typical office space. We have found that the most effective results come from a hardware/software solution. As such, I think you will see a natural migration toward companies like Andrea who can offer a software capability and engineering expertise in integrating these technologies into partners' products.
Ross: The future looks especially bright for new products. We have developed and are introducing a Silicon (solid-state, micro-machined) microphone, USB and other digital interface products, far-field microphones, high performance microphone components designed for voice-on-chip and embedded applications and Bluetooth wireless products. On our technology roadmaps are continued refinements of these products as well as other new products to create new markets for speech applications. In general, people do not want to be tethered by a cable and headsets. Furthermore, even wireless headsets are only suitable for certain applications. Far-field microphones with narrow directional acceptance angles, we feel, are an absolute necessity for broad market acceptance of voice recognition applications. However, to be successful, we need the economies of scale to drive down costs to give the consumer acceptable pricing and we need to improve the performance to meet the consumers' expectations. The performance of directional technology will continue to improve. However, there may also be certain physical limitations in far-field acoustics. Ultimately, the acoustic solution will depend on the application, and various acoustic solutions will co-exist. Microphone arrays are becoming commercially viable as headset-free voice input devices. These form highly directional beam patterns, which tend to be fixed in direction. Soon we will see microphone arrays with beam patterns, which dynamically track the talker allowing him/her to move freely while giving commands to a computer or embedded device.
sT What about noise cancellation software? How much can be done there to improve the recognition capabilities? Andrea: Software recognition is highly dependent on the front end - the voice input device. As we've discussed, the new generation of audio enhancement microphone software being developed is having remarkable results in improving accuracy and eliminating distracting noise that can corrupt the voice signal. To be truly scalable, the solution must be able to adapt to changing noise environments. These technologies are constantly evolving and being developed to suit the emerging needs of a new grade of communications devices and software. The key to this is the software. It will be the preferred solution because it will enable the most flexibility.
Ross: Most speech recognition systems already incorporate noise-cancelling algorithms to improve their performance. Current software tends to be rather broad in application and good at reducing steady state noise. The difficulties arise when the noise is non-stationary (e.g., other talkers, slamming doors, etc.) which has a greater effect on SR systems. Many of the newer algorithms are implementing multi-microphone techniques and these are showing considerable benefits over single channel approaches. Improvements in the software will become more application specific, and in general, the rejection of unwanted noise and improvement in signal to noise ratio will continue to increase, however, the key question is whether we can ultimately reach the performance level of a good noise cancelling headset microphone.
sT Voice-enabled devices are big news in the automobile industry. Can you tell us what's happening there, and where we're going as far as speech is concerned in our cars? Andrea: There is a definite need for a more robust audio solution in that area. In particular, we see automotive applications, i.e., telematics, as one of the more near-term opportunities for us. The noise problems inherent in this environment demand an extremely effective noise cancellation solution. For instance, a standard, single-element microphone sold with an existing hands-free car-phone kit does not enable a high level of quality and accuracy which is needed for these constantly changing noise environments. which are much more harsh than in an office. However, you can't put headsets on everyone driving cars. The answer lies in a far-field technique enabling a hands-free mobile computing environment.
Ross: The automotive applications include voice-enabled wireless telephony, command and control, Internet and rescue. To do pure recognition, far-field microphones must improve to obtain acceptable accuracy levels. In addition, high speed wireless (3G) is also important. The automotive markets offer tremendous opportunities, however, the environment is different from any other. First and foremost, developing hardware (microphones, electronics, etc.) that can withstand the severe environmental requirements is critical to their success. With our Silicon microphone we believe we have a workable solution and with its incorporation with our far-field microphone technologies, untethered, hands-free speech recognition in a car will become a reality.
sT What is the broad picture for acoustic device providers? Where is the focus now, and where will it be two years from now? Andrea: Clearly, now that Andrea is shifting toward being a provider of digital, software-based microphone solutions, we will pursue a licensing model where we would be incorporated in a host of both hardware and software speech-enabled applications. The goal is to become a household name and to be recognized as "the microphone technology provider." With speech coming to the forefront, we will be building name recognition and people will understand there is a technology and name behind the microphone. As you see more web-enabled cell phones and handheld communication devices coming to market, we will aim to become an integral part of enabling these communications, making them more user friendly, and promoting speech as the primary user interface.
Ross: The emphasis today is on aftermarket solutions and add-on accessories. In the future, with a higher level of market acceptance, the emphasis will be on embedded acoustic solutions (e.g., voice-on-chip). This will forge stronger OEM alliances, and drive costs down. This will require a strong acoustic technology base and expertise in microphones to develop the optimal solution for each application. The convergence of technologies and products, as well as the complexity of products, will create significant potential for device suppliers to offer unique, cost effective solutions. In the long run, we will be able to talk to our homes, appliances, cars, etc., increasing productivity, performance and ease of use. There is no more natural way of communicating than by voice, and all voice systems will require an acoustic input device.
Gary Moyers is the editor of Speech Technology Magazine.