Weighing the Possibilities of the Scalable Language API
Why would the world need another API that deals with speech?
In our view, another API is required because the previous APIs address only half the job. The SAPI, SRAPI and Apple APIs deal with speech recognition and speech synthesis but they leave the application developer to write a great deal of code if they intend to process natural language sentences.
Even a very simple application will need to deal with several complex linguistic issues. If there is no facility in the API to manage these issues, the developer is left to sort it out. The Scalable Language API (SLAPI) is a broad API; it addresses the full problem of natural language - including speech.
Suppose you have a great application that could use spoken natural language sentences. The Class 1 APIs - SAPI, SRAPI, Apple - can return a string of words to your application. But then what? You have not truly done anything significant if you do not somehow extract meaning from that sequence of words. Once you have a sequence of words, what is next?
At first glance you may assume that you could simply hard code a few sentence templates. With such a template you could treat the noun or verb as a variable. Consider a calendar application for checking and making appointments. Such a template may look like this...
When is my appointment with ___?
A closer examination reveals that there are many ways that the user could ask this or closely related questions:
What time is my appointment with ___?
When am I supposed to meet ___?
When is my meeting with ___?
Is my meeting with ___ today?
When and where am I supposed to meet ___?
You could force the user to always use only certain pre-set sentences. While this would make your application easier to write, it would make it less easy to use. The user would quickly become frustrated at having to memorize the limited sentences your application can handle. This approach puts the burden on the user to conform to the application.
To get from words to concepts the sentence will need to be parsed. The syntax of any language is defined in a grammar. If you read a BNF grammar for a programming language such as C, you see that it is quite intricate and involves many rules. However such a grammar will pale in comparison to even a limited grammar for any natural language (e.g. English, French). Even a limited grammar will involve hundreds of rules. Moreover the grammar for C was designed to be unambiguous.
Natural language grammars involve both ambiguity, and agreement rules. Beyond this there are complex linguistic issues of morphology (possible forms of a word). French has more than twenty forms of a regular verb. The Scandinavian languages have twelve forms for nouns. Many languages have multiple grammatical genders, some have complex case systems. While the Class 1 APIs do include some facilities for writing simple BNF grammars, none of them adequately addresses the complex issues involved in writing a practical natural language grammar. For these reasons, writing a parser for a natural language grammar is far more complex than writing a parser for C. Often a parser will begin to parse a sentence one way and then realize that it made a faulty decision and must back up to that point and start over. This is called backtracking. It is a major issue in parsing natural language but does not apply to other parsers for simple grammars. To minimize backtracking and thus increase efficiency, N-word look-ahead is often used. There are other issues such as feature inheritance to deal with. All in all, writing an efficient natural language parser is not for the faint of heart.
Natural Language How long would it take you to write an efficient natural language parser? How long would it take to write a simple grammar of English? When you are ready for a Korean or Japanese version, how long will it take you to write those grammars? By this point you are probably wishing that someone else would take care of the linguistics issues so that you can focus on the application features. That is the philosophy of the Scalable Language API.
The framework of the Scalable Language API allows vendors to provide the language related components - grammars, lexicons, parser, speech recognition engine, speech synthesis engine, and inference engine - in a modular, interchangeable fashion. As the developer, you select the languages, vendors, and components for your system. SLAPI compliance guarantees that they will be compatible.
The greatest advantage of SLAPI to the developer is the advantage of working on the conceptual level rather than on the detail level. The Class 1 APIs force the developer to write and handle the linguistic intricacies themselves. As a Class 2 API, SLAPI frees the developer from these linguistic intricacies. This is accomplished with a meaning representation.
InterLing is a simple notation for expressing natural language concepts. Unlike natural languages, its syntax is both simple and very regular. There are no exceptions to handle. Verbs have only one form; there are no declensions. A few high-level rules written in InterLing can replace pages of C code.
Calendar Application The easiest way to get an overview is to see an example. Lets take a look at a simple application written using SLAPI. Assume that we want a calendar application. The user can verbally ask a question about an appointment. The application responds.
Here are some sample queries and the replies that your application will accept and automatically reply to.
When is my meeting with George?
What time is my meeting with George?
Your meeting with George is at 3:00 on Friday.
When am I supposed to meet George?
Youre supposed to meet George at 3:00 on Friday.
Is my meeting with George today?
Yes, your meeting with George is at 3:00 today.
Any parsing of English grammar rules are left to the SLAPI engines. The developer do not have to create any of it. SLAPI handles both the analysis and synthesis - both in speech and text. It automatically transforms When am I supposed to into Youre supposed to in the reply. Moreover the integration between the speech engines and the parsing engine is seamless, as far as the developer is concerned.
All of this is handled behind the scenes by the SLAPI manager. The speech recognition engine may come from one vendor, the speech synthesis from another, and the parser from yet a third. Because of the modular architecture of SLAPI they all work together. This gives great flexibility to the developer in selecting the best components.
What about error handling? SLAPI has a built-in mechanism for catching logical errors and responding. In the question, When is my meeting with George?, there are assumptions that the user makes: that such an appointment exists, and that George is unambiguously understood. With a few rules, the SLAPI inference engine can detect these assumptions and automatically respond with a clarifying statement or question. Here are some possible responses. to the question When is my meeting with George?
You dont have a meeting with George.
Which George do you mean?
The meeting with George was canceled.
Global Support Suppose that you need localized versions of your application for 5 different languages including Danish. Perhaps your Danish is a bit rusty. Thats not a problem under SLAPI. Each grammar is developed and tested by the vendor. SLAPI includes test corpora to assure consistency and completeness across languages. Simply change to SetLanguage( Lng_Danish ) and you have a Danish version.
Hvornar er min mode med Georg?
Din mode med Georg er klokken 3 pa fredag.
SLAPI is a complete framework for building conversant, natural-language applications. It is an open, well-defined system that anyone can write components for or build applications with. Unlike SAPI, SLAPI can run on a variety of platforms including the new handhelds and consumer products.
Because SLAPI facilitates natural language parsing, the developer can use commercially written grammars and is not forced to write his own. This speeds both development and localization. Because SLAPI includes standard test corpora for grammars and components, quality and consistency are improved.
Kurt Fuqua is president of Cambridge Group Technologies and can be reached at 76061.3350@compuserve.com
Scaleable Language API is a trademark of Motorola