Factors of 10, or, Is This Trip Necessary?

As some of you are acutely aware, autumn involves large checks going to tuition (at all levels these days). Which brings me immediately to my first factor of 10: I'm pretty sure that when I went to college it was 1/10th of what I'm paying for Vassar this fall. Other factors of 10 that come to mind include the following strings:

Total ASR-based outsourced teleservices (global) of around $200 million versus total revenues network-based voice messaging of around $2.5 billion versus outsourced teleservices market (live agent plus consulting, the whole ball of wax) now approaching $20 billion;

ASR ports installed of around 200,000 versus IVR ports of approximately 2.5 million versus VoIP ports installed of around 5 million (OK, it's only 5X there);

WiFi shipset (baseband and RF) of around $15 to $20 versus GPRS 2.5G handset of around $150 to $200 versus wireless Tablet PC of around $2,000;

Universe of server-based ASR engine companies of around 10 to VRU/Media Gateway producers of around 100 versus PBX/ACD/communication Server manufacturers of around 1,000.

When we look at these progressions, the thing that jumps out at us is the fact that markets often get gummed up in the middle of the series. Indeed, it's the 10X-to-10X-to-10X odyssey that leaves so many companies by the side of the road. If we could just connect, say, ASR out-sourcing capabilities with the larger outsourced teleservices opportunity — going direct to the 100X ramp-up — we would have much more growth and ubiquity. Likewise, if we could just replace DTMF in every switch going out the door we'd truly have voicetone instead of dialtone.

"Ah," you say, "I can hear the sound of the pig's wings flapping now." But the principle that we are getting at here is simple and real: markets follow free or near-free resources. What we have going in the speech marketplace is the nightmare reverse of this: three or four buying decisions for every one of three or four buying decisions. At this point, you're probably thinking, "OK, we know the market is fragmented, that's why Darwin sailed in the Beagle, to take care of that problem."

But the point is not so much about consolidation (which we all know is well under-way), but about jumping over the mid-point of that exponential scale. Typically, the mid-point is just what it looks like — an interim stage on the way to something bigger. The logical outcome is that competitive advantage in the marketplace comes from accelerating beyond that interim stage. Here's a proof-point: in the early '90s, Aspect cleaned Rockwell and Lucent's clock in the ACD market by producing a call center ACD solution with inboard VRUs, and packaged apps that included voice-mail as an option for the caller.

Another one: Cisco made major headway against specialized VoIP gateway vendors by bundling H.323 and other voice software capabilities into its 24XX and 53XX platforms. VRU's inside the ACD skips over the midpoint (Aspect), instantiating VoIP in a data router (Cisco) skips the specialized mid-point stage as well. Embedding new technology in well-estab-lished, pervasive platforms is a way to grow markets. It's also a way to shorten time to market. This is not as flashy as starting a whole new company, and therefore has the disadvantage of making the world a more boring place (albeit more efficient).

Therefore, assuming we don't all want to live in a world where you can have any color as long as it's black, we close with a few thoughts on speech-related developments that encourage us to move beyond the midpoint, if not leave it out entirely:

License managers: The use of stateful multi-tiered license managers would help service providers — the biggest consumers of speech licenses — manage their investment more efficiently, and encourage more buying.

Telephony-rich middleware: Take legacy and converged telephony protocols that have for too long justified the existence of expensive specialized platforms and tuck it up under the application server so it can talk to the (phone) network.

Host-based media processing: With apologies to specialized DSP manufacturers, the pendulum between specialized silicon and general-purpose processors swings back and forth. When the clock speeds for the latter are available at a 10X ratio to DSPs, guess who's the mid-point now?

More handset integration: The embedded speech market is languishing because handset manufacturers haven't recognized the value that speech brings to the world's most pervasive client platform (there are now more wireless handsets than fixed phone sets in the world). More importantly, the speech developer community hasn't focused on the need to understand J2ME, BREW and other wireless client stacks.

If this whole piece sounds a little service provider-centric, we plead guilty. That's because in terms of making speech ubiquitous, we still maintain that the service provider community has the potential to create the greatest volume of speech encounters at the end-user level, more than enterprise, more than desktop/Web, more than consumer electronics in the next few years.

Mark Plakias is a partner and senior consultant for The Zelos Group. He can be reached at (212) 366-0895.

Factors of 10, or, Is This Trip Necessary?

Vonage Integrates with Salesforce's Agentforce Voice

Lorikeet Launches Voice 2.0

Krisp Launches SDK for AI Accent Conversion

Pluralsight Expands AI and Voice Capabilities in Iris