You Say Zee, I Say Zed
George Bernard Shaw spoke about England and America as "...two countries divided by a common language". If Shaw had been speaking technologically, he would have been showing the seeds of localization. The need to localize a software product has been understood for several decades. Localization (frequently abbreviated "L10n", for "L"+10 letters+"n") involves customizing software for a particular country. It includes translating menus and messages into the native language, as well as changes in the user interface to accommodate different alphabets and culture. It also affects documentation, packaging, etc. Locales represent a set of conventions determined by language and customs; these conventions include spoken language formats for dates, numbers, currency, units of measurement, etc. In the graphical user interface (GUI) context of software such as Mozilla or WINDOWS or MAC OS, L10n issues are reasonably well understood. If content for one locale is displayed in a user's GUI that is handling a different locale, it is relatively easy for the user to recognize that fact visually (different script; jumbled characters; a priori knowledge that a particular site has a particular address, e.g. ".com" instead of ".uk" or ".jp", etc.). Specific guidance on localization issues in graphical user interfaces can be found easily in published sources. Many of the issues on which they focus do not translate easily to the voice environment (focus is on the size of dialog box elements after translation, alphabetic sorting conventions, character representations, etc.). In a voice user interface (VUI), such contextual clues are not accessible (to the user) and the entire system may not work or may give erroneous data (e.g. misread dates). Accordingly, designers of VUIs need to be given a systematic approach to, tools for, and be made sensitive to issues involved in the L10n of voice applications. This article gives an overview of some core issues involved in localizing speech applications to run in British English (UK English), intended for use in a voice-driven user interface. Elements and ideas may be incorporated in a voice application development environment to allow rapid prototyping of a 'foreign' language version of a voice application. While localization to French, for example, might provide more arresting instances of challenges in L10n, it is better to provide examples for UK English here, since the language transfer issues are more transparent to a greater readership. The L10n of a voice application has several subsidiary components, including: 1. translation (including syntactic and semantic normalization) of text, recorded prompts and grammars 2. generation of new grammars and prompts to accommodate likely input changes (e.g. replace US Cities/States list with cities/counties list in the UK, etc.) 3. generating a localized pronunciation dictionary (e.g. phonetic transcription of words for speech recognition and text-to-speech (TTS) purposes) 4. manual refinement of the application for cultural issues (e.g. conversational style of the application). Many of these steps can be automated partially or entirely, and would be explained better elsewhere. Other components of a VUI system that will be affected are conversions in the areas of text field recognition, presentation and parsing. Grammar-building considerations (for grammars deployed in the automatic speech recognition program) lie outside the scope of this discussion. Instead, we present multiple examples of some types of text that might be encountered in audio content supplied, and documents written in British English for some common VUI topics of interest.
Localizable items
Diagram 1 Dates: European and Scandinavian formats
Time: 24-hour clock, a.m., p.m., seconds, etc.
Numbers: single digits, pairs, sets of three, telephone numbers (international differences in grouping), with decimal point, signifying part numbers, fractions, etc.
Currency: pre- and post-positioned symbols and amounts
Measurements
Symbols: mathematical, equations, Greek characters
Names: personal and commercial, place names, trademarks etc.
Titles: Dr., Mr., M., Jr., Sr., etc.
Addresses: different local formats
Abbreviations vs. Acronyms
Ambiguous abbreviations
Homographs
Character names: ASCII and standard Roman
Text input that needs L10n most commonly includes items in diagram 1. Space limitations preclude discussion of all these areas. Instead, some key issues are chosen for focus, namely those that appear frequently in typical voice-driven applications. Some applications available currently that might be affected by L10n are listed in diagram 2.
US name | Suggested UK name (if different) |
Menu | |
My Favorites | My Favorites |
Blackjack | Pontoon |
Restaurants | |
Movies | Films |
Driving Directions | |
Travel | |
Ski Reports | |
Stock Quotes | Stock Market |
Sports | |
News | |
Traffic | |
Weather | |
Soap Operas | (no equivalent) |
Lottery | |
Time | |
Among the most frequently accessed menu items are News, Stock Quotes, Travel and Sports. Exploration of L10n text fields (Numbers, Currency, Dates, Abbreviation and Acronym resolution inter alia) for these topics would require several more similar length articles to the current one. A narrowing of focus is needed: two popular items that provide instances of L10n needs are Movies and Driving Directions. Accordingly special attention is given here to the L10n of Addresses and Time. Text fields fall into three natural groups: those that contain largely alphabetic characters; those that contain largely numeric characters; and those that contain alphanumeric strings. Within the three text fields discussed below, the character sets affected are designated by braces, e.g. {Character set}.
Addresses The format of postal addresses differs greatly according to different international standards. This section is limited to the formats commonly used in the United Kingdom. Other countries will have language-specific features and additional characters. Note that the United Kingdom uses "Postcodes" or "Postal codes" that are alphanumeric. France, conversely, uses postcodes that are solely numeric. They are restricted to five digits and placed before the town/city name. The term "Zip code" is unique to the United States and should not be used in other language applications. Use of punctuation is highly variable, so the text field detector will have to be insensitive to its (non-)inclusion. Addresses may be full or partial, and they include elements from the common formats in diagram 3.
| Examples |
{Title}{Name}{Title} {Number}{Location}{Street} {City/Town}{County} {Postcode} {Country} | Mr. Andrew Harris 10 High St. Guildford, Surrey GU14 7QH England |
{Title}{Name}{Title} {Number}{Location}{Street} {Town/Village} {County}{Postcode} | Dr. A. Harris 10 High St. Old Marston Oxon. OX3 OPR |
{Title}{Name}{Title} {Location} {Town/Village} {County}{Postcode}
| Rev. Harris The Old Rectory Old Marston Oxon. OX3 OPR |
{Title}{Name}{Title} {Location} {Street} {Town/Village} {County}{Postcode} | A. Harris Esq. The Bakehouse Crabtree Lane Old Marston Oxfordshire OX3 OPR |
Members of the set {Title} will be derived from a subsection of a specified "Abbreviations" dictionary. The "Abbreviations" dictionary will include pre-positioned titles, post-positioned titles, abbreviated titles, full titles, hyphenated titles, etc. Members of the set {Name} will be derived from a special "Proper Names" dictionary. The "Proper Names" dictionary will include abbreviated first names, full first names, last names, etc. Initials preceding full names (e.g. Mr. S. Andrew Harris) should be handled competently by a (normal or default} single letter detection mode. Members of the set {Number} may need to include a special subset to be used in the text field "Address". This would ensure that, for example, "No." or . "No." and "-B" are read correctly (i.e. as "number" and not "No"; and as "B", and not "dash B" or "minus B") in this domain. Members of the set {Location} will be derived from the main dictionary and will include, for example, the following: {building, flat, room, suite, box, floor, unit}. If the {Location} appears in its abbreviated form (e.g. "bldg." or "Ste."), then these will be accessed in the {Abbreviations} list. Members of the set {Street} will be derived from the "Proper Names" and the main dictionaries and will include, for example, the following types of "streets": {Road, Street, Drive, Avenue, Lane, Crescent, Way, Circle, Arcade, Park}. Members of the set {City} may be derived from the special "Proper Names" dictionary. Members of the set {County} will be derived from a special "County Abbreviations" list. Unlike abbreviations for States in the United States, the United Kingdom uses true truncations of county names, e.g. "Worcs." for Worcestershire and "Leics." for Leicestershire. Some counties with one- or two-syllable names are never truncated or abbreviated, e.g. Kent, Sussex, Suffolk. Members of the set {Postcode} should not require a special "Postcode" list, since they should be handled by the correct operation of the {Number} and individual letters options. However, a stipulation should be made available to pronounce the character {O} in Postcodes solely as "oh". {Dash} should always be pronounced as "dash", not "hyphen" or "minus" in this semantic location. In keeping with the lack of sensitivity to punctuation in general, provision should be made for the string {POBox} to be read correctly, regardless of whether it appears as, e.g: P.O. Box, PO Box, P O Box, po box, etc. Members of the set {Abbrev} will be derived from a list of specified abbreviations. Time Time may be expressed according to the 24-hour clock (a.k.a. "military time" in the US), as digits less than 12 followed by "o'clock", digits greater then 12 followed by "a.m." or "p.m.", including the {Word} differentiators "midday", "noon" and "midnight", and as digits, separated by colons, expressing hours, minutes and seconds only. Use of punctuation in Time is highly variable, so the VUI application programmer will have to be insensitive to both its variability and its (non-)inclusion. Time may be full or partial, and it may include elements from the common formats below. Examples
{Hour}{Minute}{Second}{AmPm}{o'clock}{Word}
12:04
02:04
2:04
23:04
12:40
12:55
2:04 a.m.
2:04 p.m.
2.10 am
2.10 pm
12:00 noon
12 midnight
0:00:00 midnight
01:13:42
01:13:42 am
1:13 o'clock
Suggested default setting for UK English Time pronunciation is
{Minute}{Hour}{AmPm}:
Examples
02:04"Four minutes past two a.m."
2:04
2:04 a.m.
2:04 am
02:04:00
2:04 o'clock
The terms "o'clock, noon, midday, midnight" should be stripped out from pronunciation if they follow an hour-minute separating colon and two digits, for the sake of naturalness and clarity: Examples:Pronounced
3 o'clock:"Three o'clock"
3:00 o'clock:"Three o'clock"
3:12 o'clock:"Twelve minutes past three a.m."
12:00 noon:"Twelve noon"
12 midnight:"Twelve midnight"
0:00:00 midnight:"Twelve a.m."
If users wanted the option of "military time", this might be provided. The alternative pronunciations, would then be: Examples
02:04:"Oh two oh four"
08:00:"Oh eight hundred hours"
24:00:"Twenty-four hundred hours"
Syntactic and semantic issues Common semantic differences between UK and US English are widely known (e.g. pavement~sidewalk; boot~trunk; scarf~muffler, etc.). Many more subtle differences exist both in syntax and semantics, as well as 'false friends', where the same word is used to indicate different entities in the two locales (e.g. theatre). The sample dialogues for making a reservation to see a film, and Driving Directions provide numerous instances of both phrasing and individual words that need L10n and linguistic sensitivity. Sample applications: conversion Having examined some examples for L10n to UK English in these areas, the resulting differences are encapsulated in three example dialogues. Three sample dialogues, taken from an existing voice-driven application running in US English appear in sidebar of this article. These are used to indicate the types of localization and syntactic/semantic conversion that are required to localize such dialogues for use in an application for UK English. Differences are indicated using underscore on the affected US English text. A indicates an application; U indicates the User. While this review has been severely limited in presenting a variety of text issues that require L10n, it is hoped that sufficient indications have been given to encourage readers to consider the plethora of textual, cultural and linguistic issues involved in this complex task. Dr. Caroline Henton is a full-time consultant as CTO of Talknowledgy.com. Dr. Henton can be reached at carolinehenton@hotmail.com.