Philippe Roy, Conceptual Speech
NewsBlast
Tell us a little bit about yourself. How long have you been with Conceptual Speech and in the speech industry?
Philippe Roy My University studies were in Mathematics and Philosophy. But, I soon realized that software development was my passion, and I did exactly that for about 15 years. I was hired to work with IBM's ViaVoice product, and quickly fell in love with the speech recognition industry. It was a unique field at the time, and still is full of potential, while relying on a relatively immature technology. As I acquired more understanding about speech technology at IBM, I quickly turned to my Philosophy background to introduce a conceptual aspect to speech analysis. During my philosophy studies I learned about Roger C. Schank's conceptual dependency theory. That theory is of great value to the speech industry since it can manipulate concepts as easily as a computer can manipulate strings. Speech recognition was (and still is) lacking that conceptual aspect, basically just performing sound matching through grammars. This limits speakers to canned commands, and is obviously insufficient to provide a truly useful interaction between a speaker and an automated device. I quickly realized that by introducing conceptual analysis into the speech recognition process, speakers would be free to speak in their own words and could convey ideas instead of canned sequences of words. The next three years of my life were dedicated to developing, refining and testing the use of concepts in a speech recognition process. I engineered a new bread of speech engine that reacts to calculated concepts. The result is a US and international patent application with over 400 claims that describes the technology in great detail, and a conceptual speech recognition engine and SDK that will soon be released named Conceptual Speech Commander. Conceptual Speech, LLC was formed over a year ago with the help of private investors. Its purpose is to transform this engineering success into a commercial success. Our business model is not one to compete with existing developers of speech recognition, but rather to provide the tools and support they need to integrate conceptual speech recognition into their applications. The goal is to create a better product that will provide the end-user (for example, the person calling into an automated system) with a more effective and positive experience.
NB
Why would Conceptual Speech Commander SDK be helpful to someone working on a speech recognition project?
PR Conceptual Speech Commander and its SDK are the first commercial expressions of conceptual speech recognition. It is a paradigm shift for the speech industry since it is the first product that performs speech recognition by calculating a concept from an utterance. This is a major thing for the speech industry. Conceptual speech recognition is capable of achieving many goals that are, and will stay, unattainable through grammars. First, with conceptual speech recognition, a speaker can state a command in his or her own words because Conceptual Speech Commander can recognize configuration of concepts, so asking for two, three or more things at the same time is possible. For example, someone calling an airline scheduling system over a normal phone line can ask "when and where did flight 600 arrive and how late was it." There are actually three question contained in that utterance. The user is asking 1) when did flight 600 arrive; 2) where did flight 600 arrive; and 3) how late was flight 600. With conceptual speech recognition, it is possible for the system to answer "United Airlines flight 600 already arrived at gate B-2 of Dallas Fort Worth International Airport in Dallas 20 minutes ago and the arrival was late by 12 minutes." Handling a complex question like this is not possible using conventional grammars. An additional benefit is to significantly improve accuracy, since a conceptual speech recognition system is not looking for word. It is looking for a valid concept and uses words only to build concepts. This enables our technology to work better than the older technologies with non-ideal sound recognition like what we find over telephone lines or radio waves. In more technical terms, a conceptual speech recognition system seeks for and finds the right phoneme within the set of highly probable phonemes for each time slice, and is allowed mistakes, whereas a grammar based speech recognition system depends on the unique correct phoneme to be detected for each time slice (while using its grammar bias). Furthermore, a conceptual speech recognition site that is well implemented can also detect the mood of an inquiry if desired. That allows responses to be produced that match the mood of the caller in order to provide a better flow of conversation. For example, a conceptual speech recognition system can detect if the caller is speaking in a business-like or polite tone, and responses can be formulated to match the caller's mood accordingly. There is much more to conceptual speech recognition. It is a fresh start for the speech industry, an opportunity to build upon a new technology that already does more and better than what has been considered state of the art speech recognition technology until now. It is also expected that conceptual speech recognition will mature in the next three to five years and transform itself into something even more powerful. On a commercial aspect, I strongly believe that conceptual speech recognition will enhance, and ultimately replace, grammar based speech recognition systems within the foreseeable future. Businesses that migrate to this new technology early in its life cycle will be better positioned to take advantage of its potential and capture the market share that will escape the businesses that have waited for the unavoidable. In that sense, we think that forward looking businesses should carefully explore conceptual speech recognition, and give serious consideration to making a transition sooner rather than later.
NB
How long does it take to incorporate the Conceptual Speech Commander SDK into an existing application?
PR Producing a conceptual speech recognition system involves learning of a new way of performing speech recognition. After two to three systems have been developed and implemented the learning curve will decrease and production time will approach what is experienced today with conventional speech recognition systems.
NB
What skills are necessary to utilize conceptual speech recognition?
PR The same engineers who produce today's speech recognition systems are able to produce a conceptual speech recognition system. Besides good code-writing skills, they should also have some knowledge of linguistics. Syntactic analysis is relatively important in conceptual speech recognition, so in order to produce a conceptual speech recognition system, one needs to have some knowledge of parts of speech and be able to build the vocabulary accordingly. It is not rocket science. English is my third language and I've managed to do it. Finally, the software engineer must become familiar with a new language, which is used to build 'Predicate builder scripts.' This is really not difficult either, but it is a necessary part of our technology used in conceptual analysis.
NB
What customers are using Conceptual Speech Commander SDK? What has the response been?
PR In any developing industry, there is a tendency for people to become jaded with revolutionary developments that don't live up to their potential. So when the real revolution comes, many tend to look the other way until it is forced upon them. The speech recognition industry is no different, and we need to live with it. But like in any other industry, there are the people who can spot the true revolution and jump on board early. These people make businesses thrive and flourish. That being said, there are people within and outside the industry who recognize that conceptual speech recognition is truly a revolution for the speech Industry. The investors who provided our seed capital have experience with intellectual property and speech recognition, and recognized the potential over a year ago. More recently, some key players in the industry, with which we are either in negotiation, or which by contract I cannot name, have had enough vision to explore our technology and were able to get started on the use and promotion of conceptual speech recognition. In the near future, one such relationship will be announced. Since the technology is relatively new, and our Conceptual Speech Commander SDK is yet to be released, there are not any current implementations other than our demonstration of an airline response system. Our Web site contains a lot of documentation regarding our technology, and we are dealing with a handful of businesses that are investigating implementing an application. So in general, the response has been positive. Although so far, few individuals have developed a thorough understanding of the technology, the good news is that people who have done their homework are impressed. So I guess the challenge is on us to continue educating. In the short term, we will do this by producing some convincing sites, an SDK (Conceptual Speech Commander) and some associated products so that the market can see and come to fully understand the broad capabilities of our new technology.
NB
What is the pricing structure for Conceptual Speech Commander SDK?
PR We are still developing our pricing structure, and will announce it concurrent with the release of the Conceptual Speech Commander SDK. But I can state that our pricing will be competitive considering the value our technology will add to a speech recognition application.
NB
What separates Conceptual Speech's product from your competitors?
PR A complete generation of speech recognition technology separates us from competition. No one out there performs speech recognition like we do. That being said, our business model doesn't call for us to compete with developers of speech applications, but instead to cooperate with them and provide a technology and tools they need to take their products to the next level. Most of today's speech recognition businesses already do a terrific job, we can only imagine what they could do with a technology like conceptual speech recognition! None of us is as strong as all of us. That is why we chose to focus on developing an SDK instead of applications. With our technology, we think speech application developers will be able to do more and better to continue improving their business.
NB
Discuss your future product releases.
PR Conceptual speech recognition needs to go through the same phases of evolutions that current state of the art speech recognition has been through to get to where it is today. That is, first, we need to master limited vocabulary command and control speech interactions. Conceptual Speech Commander takes good care of that. Although it is the easiest implementation of speech recognition, it is also the most rewarding since it addresses most needs of today's market. IVR systems are an example of command and control speech recognition systems. Next on the list is addressing the needs of embedded devices which could use simple command and control speech recognition, Tablet PCs and so on. Longer-term plans include developing high-accuracy, mission-critical limited vocabulary dictation systems that use conceptual speech recognition, and we'll probably spend some energy on this aspect of implementation in the medical and legal fields. Another area we are looking at is the mining of audio and data for concepts. This could have significant possibilities in research and national security and we are having discussions with people interested in these applications. Furthermore, imagine a search engine that can search audio broadcasts and Web pages for concepts, rather than simply looking for keywords. Finally, in order to close the loop, a large vocabulary dictation system using conceptual speech recognition is also achievable. As a matter of fact, those dictation systems can already be done. The challenge is not an engineering one, but a scalability one. In order to produce a dictation system that uses conceptual speech recognition, the vocabulary required needs to be defined conceptually. That is, a word is not limited to a spelling and pronunciation anymore, it also has a "Predicate Builder Script" associated with each part of speech. Consequently, in order to produce a large vocabulary dictation system, it will be necessary for that large vocabulary to be defined conceptually. That means a lot of work… Like Ferdinand De Saussure said, "A word is a linguistic entity that only acquires meaning when put in relationship with other words." Imagine defining conceptually each word of a 125 000 word database where each word can be put in relationship with each other. There are also other obstacles to overcome, like the processing time and power needed for such a large conceptual domain. But just as current hardware has evolved to support current speech recognition systems, future advances in processing power will make trivial what we perceive as a challenge today. So a large vocabulary dictation system is a task that is doable, but it will take time. As a derivative of that, eventually, such a system should also be able to provide automated punctuation without the need to state commas and periods in the dictated content.
NB
What are your expectations for the speech industry in 2004?
PR The industry will continue to expand and evolve, and we plan to start playing a significant role in this. In 2004, we will ship Conceptual Speech Commander and its SDK, and our goal is to end the year with three to five major implementations of systems incorporating conceptual speech recognition. We look forward to cooperating with businesses that choose our SDK, and will do whatever it takes to help them succeed in their deployments. After all, it is these successful deployments that will help the market realize the full potential of this new technology. In the short-term, we may develop and deploy some applications on our own in order to help the market realize the true revolution in this technology. But our long-term plan is to leave space for the market to implement conceptual speech recognition applications while we devote our focus on our long-term goals.