Audio Search and Mining: A Look at Unstructured Data
Search engines such as Google are receiving about 1 billion search requests per day. Taking the ability to query text and applying it to voice opens up many areas of opportunity. As it pertains to the Web, users can search audio files and audiovideo feeds. Enterprises can use this technology to find important customer concerns and even potentially enable employees to search voicemails or recorded calls for key words and phrases. It would seem that the sky is the limit for this technology, which is why Speech Technology magazine pulled together a group of experts to determine the limitations and opportunities for the audio search and mining industry.
Stephanie Staton, associate editor of Speech Technology magazine, conducted a roundtable discussion with Judith Markowitz, president of J. Markowitz Consultants and technology editor for Speech Technology magazine; Anna Convery, senior vice president of marketing and product management for Nexidia; Yochai Konig, cofounder and chief technology officer for Utopy; Larry Mark, chief technology officer at SER; Daniel Ziv, vice president of customer interaction analytics and business interaction intelligence at Verint; and Joe Watson, director of advanced technology for Witness Systems.
Speech Technology magazine: Can you describe what the audio search and mining market looked like three to five years ago?
Convery: Three to five years ago, speech analytics was a really wonderful idea and it was something that wasn't being applied in the commercial world. Most of us brought products to market several years ago. In the commercial marketplace, we really only have been bringing products to market in the past two to two-and-a-half years.
Markowitz: There were some companies that did have technology—commercial or semi-commercial— a number of years ago, such as AT&T, IBM, Lernout & Houspie, and BBN. Those technologies were developed in the 1990s but weren't as commercially oriented as the technologies today. Essentially, the market is maybe three years old in terms of the commercial offerings that companies have that are really product-oriented rather than R&D and very early adopters of bleeding edge. This is no longer bleeding edge.
Konig: Basically, we had a commercial product deployed in 2002, but it was for limited application from the whole scope of speech analytics as we perceive it today.
Ziv: We deployed our first commercial product in early 2003, but the market was very small and immature. The technologies that you are speaking of are really fundamental engines, but there were not speech analytics applications deployed in a commercial environment before three or four years ago.
Watson: The technologies that we are referring to are really coming out of other industries, like government, as well as the stuff that came out of the IVR world. That was a more mature market three to five years ago than speech analytics was at that time.
ST: What happened in the last three to five years to further the market's growth and acceptance?
Konig: What has happened is the expanding of the scope of the business value that speech analytics can bring. The initial business value was more on the agent quality monitoring type of application. In the last few years, we have seen an expansion to other business benefits, more specifically, to the business intelligence side. Basically, companies were able to get insight about different business processes that are not necessarily about the agent, but about the customer: what the customers' issues are, why they are calling, how the company can make them more happy, how the company can sell to them more effectively, and so forth. The expansion of the business value is fueling the growth of this market.
Convery: What you see is a maturity of how people look at speech analytics. Instead of it being what it was a couple or three years ago—a very efficient tool to go and look at what your agents are doing, now what we see is a very significant business case for development looking at strategic initiatives for organizations, getting that intelligence to the organizations in an efficient manner, and then having them act upon it. If you were to look at some of the business cases that were developed two or three years ago and compare them to today, you would see quite a dramatic difference, both in terms of sophistication and the return on investment.
Ziv: It's really about having more access to information, and the business intelligence market is part of that, but it mostly focuses on structured data versus unstructured data. Unstructured data is a very new market. The amount of unstructured data is much larger than the amount of structured data and the potential value there is much greater because it has much deeper insights. It is actual customers talking and telling you what they want rather than you just knowing how old they are and where they live. The value is there, and the potential is taking some of the tools, processes, and consulting around what has been done in the business intelligence community and applying it to this unstructured data and linking it to the CRM world and the business intelligence world.
The area of greatest growth is in people realizing that this information exists, that it is available, that you can mine it, and that you can extract information from it. In 2010, this market is probably going to be much larger, but it is going to be combined with a lot of other things that are part of CRM initiatives, business intelligence programs, and enterprise-wide information systems. These are multibillion-dollar markets that are ready, but just aren't using this information today.
Mark: In parallel to these applications moving up the food chain (going from just agent monitoring or quality assurance to more of a business intelligence application providing greater value) the technologies—both the underlying computers and the speed at which the engines process—have all increased. At the same time that we are providing more value, we are providing it, at least from a capital investment perspective, at a much lower-price point because there is a much lower capital investment needed to get value out of systems than there was even three years ago.
Watson: Another aspect that we also see is that it gives us another portal into customer insight, into what the customer is really talking about. It is not the only portal, but it gives an additional feed of information so that businesses can now understand what a customer is saying and how he is truly reacting.
Markowitz: One of the things that makes this more understandable to the market and fit in with the whole structure of what they're already doing is that many companies have combined this with things that are already in the environment of the customers. They can see it as an extension of what they are already doing and it is easier to understand. That makes it easier to do all of the other things that have been discussed already.
ST: What counterforces have been—and still are—stalling the growth of the audio search and mining industry?
Ziv: Like any new technology, there are a lot of misconceptions about what it is. Some people see this as part of or similar to speech IVR and compare the advantages and disadvantages of that. It is a very different application, requires very different technology, and requires different types of services around it. It attaches itself to different types of platforms, and while it does draw from the same core basic technology as speech recognition, it is very different in terms of its application in the market. That has caused some confusion and potential delays in deploying this because the company has already deployed speech recognition IVR and is drawing upon its decision from that. The advantage for speech analytics is the speech recognition doesn't need to be as accurate as you would need for a speech IVR system because you have the advantage of statistical information. You have a lot of calls and a lot of words to look for, so even if you missed one word here or there—which you always do in speech recognition, it doesn't have the same effect as a customer talking to an IVR and getting upset because it didn't understand his last word.
Convery: Certainly our experience is that accuracy is very important and that our customers do not have tolerance for less-than-accurate results. They do expect speech analytics technology to generate very accurate results. They expect to be able to depend on it, especially when they look at the mission-critical applications where they are trying to find something that needs to be acted upon very quickly, like a breach in a compliance statement or customer identification information. I certainly see customers who are very demanding and very exacting of that and who stand by that.
Ziv: I agree, but that is part of what is hindering growth. None of the applications that are out there today, and I have seen all of them, are 100 percent accurate. You may identify the word correctly, but if you don't understand the context of that word then you are missing the point. The perception is that we'll wait until the speech recognition technology is 100 percent accurate, and it's what potentially makes marketing and education a challenge for all of us because I don't see it being 100 percent accurate in the next couple of years.
Convery: I agree with that, but people expect to have a higher rate of accuracy. People are becoming more educated on what it can do; and a key point is that people who try to say that this can be 100 percent accurate are clearly misleading the marketplace. A high degree of accuracy is important and we have to deliver on that. Even if you are using other logic and other rules to get context, if you have significant inaccuracy in the core search all you do is amalgamate inaccuracy upon inaccuracy and your results are still not going to be good enough.
Konig: To increase the value for a solution, it needs to be more specific to a vertical and even a certain department within a vertical. If you look at the analysis of the business intelligence market, you see that a company has a specific offering for financial, insurance, etc. One of the trends that we see going from horizontal technology platforms to more specific solutions and applications is that it can articulate to a matrix. If you imagine a matrix where one dimension is a vertical, and another dimension is a department within that vertical, there are many different departments in many different verticals. All of the vendors today are in the process of getting points in this matrix and getting a more specific, more specialized, and more impactful solution in each one of these.
Markowitz: Something that has actually been a focus of this industry in terms of improving over the years is the issue of scalability. This has been directly attacked by many vendors. It was a problem initially, as with any other new, emerging technology, but it is going away now because there is more and more integration.
Cost was more of a barrier than it is today, especially given the value that is coming through at the enterprise level. Also, a couple of the companies in the industry—and Nexidia is one of them—are offering managed services, which allows the market to extend down a little. Another area is quality reporting. The reporting that exists now is worlds above what it was before. Its usability in accessing the information was part of that issue.
ST: Where do audio search and mining technologies need to be pushed or prodded to evolve?
Konig: To identify relevant issues from a business point of view, we have to get into the world beyond the phrase and get into the context of the exchange between the customer and the agent. You have to be able to understand a sequence of phrases over a period of time. To get all of the information that the company wants and to get the meaning of it is very challenging from a technology point of view. That is the next frontier as all the vendors are evolving from getting a word or phrase here to getting the context of a specific issue, the specific dimension, the caller's specific exchange, and the order that it is all happening. This is a challenge from both an understanding and completion point of view. The technology is evolving to meet more and more sophisticated business needs.
Ziv: There is a lot more on the technology front, so I wouldn't call this technology mature in terms of that. It is mature in terms of the ability to deploy and get value from it today. As for the question of whether we've finished, whether we've done everything we can do with this information, we haven't even started.
I draw upon the business intelligence market, where you spent years just cleansing data and pulling it together to be able to report on it. It became a billion-dollar market. I think the amount of data, the richness of the information, and what we can do with it shows that, as we are currently applying data mining to the results of the speech analytics, we are taking a new technology and applying and combining it.
There is a tremendous amount of technology innovation that we will see. The success lies with the application, the services, how it is deployed, where it fits in, and addressing concerns of customer privacy and the security of these very sensitive calls. At the end of the day, technology alone doesn't solve any problems; it has to come hand in hand with the right application, the right marketing, and the right education. But, I do strongly believe that this is going to be a huge market and will weave itself into a lot of other spaces that we don't see today.
Convery: We've made so many strides with the technology and how we apply best practices around it. We really are delivering a lot of value right now, but in voice there is so much richness. It is our favorite medium with which to communicate with each other. There is so much there that can be researched and developed and certainly I know we continue to very, very heavily invest in research and development and continue to bring innovative analysis of the voice and various things. It is not just the word, but all the other qualities of the voice as well.
It is going to be a very interesting marketplace in terms of how we are really starting to deliver on the value, not only with some significant customer announcements over the next 12 months, but also with some very interesting technology announcements. It is a very active, growing, interesting marketplace to be in. It has tremendous impact across the entire organization.
ST: What about other languages?
Ziv: The fundamental speech analytics layer is really language-independent. Obviously it does depend on the speech recognition engine, which is language-dependent and requires customization for each language or accent.
English is the dominant language right now because the English-speaking markets are the most dominant markets that focus on understanding customers. The reason why English is dominant and most speech IVR and other speech recognition systems have been deployed in English is also because that is where the biggest revenues are today. That will obviously change over time, but I don't think it is really a technology question. It is much more of a business opportunity question.
If there is a huge market in any other language, it could be a problem for any of the vendors to develop that language and support it. The challenge would turn to the kind of deployment model and services being addressed to the specific business issues that a country or region has. But, overall, I don't think language is a real barrier. I think it is just a matter of prioritizing where the biggest market lies and where everybody focuses.
Convery: A phonetic approach—a different approach for supporting different languages, dialects, and accents—is something that we do a lot of. We have more than 30 languages right now, but we address many other markets as well with this technology, even though some of the languages we support may not be used in the contact center enterprise. It could be for rich media or e-discovery or whatever other markets there might be.
There is a growing desire certainly in European and Asian countries for speech analytics. Those languages are being used and, as the markets grow, you'll start to see more of the languages coming in.
Language support is important. When you sit down with an international, global company that is doing business in 15 or 16 languages, it becomes a very important question for them. The ease of support and of making sure that that language is indeed a language model that supports the way the language is spoken without too much cumbersome, heavy-duty training is important from a cost and maintenance standpoint for those organizations.
ST: Which new vertical markets that haven't already benefited from audio search and mining applications are poised to do so now?
Convery: I see a lot of education going on within the enterprise for other divisions to use speech analytics. For example, I can go ahead and do my research based on what a call driver is, what the churn is, and what is going on during a call, and then use that intelligence in other divisions, like marketing, sales, etc. What you are seeing is a growth in that area.
When it comes to other verticals, we are seeing additional attention from more services-oriented companies that are keenly aware of the quality of the service and the value of the service of the customers they are serving and getting intelligence from that. From hospitality to retail, all of those organizations are picking up on and starting to use speech analytics.
Watson: Any specific vertical is hard to say, but it is really about the customer. More and more people want to understand what a customer is saying and it is all about customer satisfaction. But in particular, we see a lot of uptake in broadband and telecom, where there is a lot of churn and trying to maintain customer satisfaction is important. These industries are starting to see a lot of value in how to understand and potentially predict where the churn is and to maintain customers and keep those customers happy.
Markowitz: We shouldn't limit the discussion to the contact center because I know that some vendors are working with podcasts, media research, and things of that nature that go beyond the call center. That is, of course, where the technology is though.
Konig: I see the utilization of this product and solution going up the food chain to the CEO and the C-level because the value is getting more impactful to the bottom line of the companies. Giving an education and showing the real value to C-level people, who start asking about the types of results that they are getting, will be very positive to the whole industry.
Ziv: In addition to different verticals, different types of interactions can now be recorded. We have already had several projects where people wanted to record their interactions at branches. It is true with some of our telcos and financial institutions that they also have branches or locations where people are talking and getting information and then calling and getting different information. If the customer calls and gets different information over the phone than he received at the branch, you don't have the complete picture. That is where another direction of expansion is into the field of where else people are talking. It goes beyond what happens in the contact center to what is happening everywhere else in the world as well.
ST: What are the potential uses for this technology outside of the contact center? What are some of the exciting things out there?
Convery: If you were to go up to 11alive.com and see their video search for their news broadcast, that is us searching their rich media. It is not actually about whether this technology will be applied; it is, and it is growing very fast. With e-discovery, the Federal Energy Regulatory Commission uses our technology to do several searches. In those marketplaces where you have more audio and audio-video, people want to search and they see its challenges. We have a lot of intelligence, a lot of findings, a lot of people speaking, we've got noise, we've got dialects, and those areas are growing as well. It has certainly been a very high-growth area for us.
Markowitz: I can see multilingual search and the consolidation of information as an extension of what has already been said. There is no reason, especially in a global company or enterprise, that you need to do your consolidation only in one language. You can combine it with translation and then combine the concept levels to see what is happening that might be similar or different across the different installations that you have.
ST: What are your expectations of the market during the next five years?
Ziv: The question is when some of these applications are sold, will they be combined with quality monitoring applications, as stand alone or as services. Datamonitor estimated the market for 2005 at $50 million and estimates that it will grow to $218 million by 2010. The majority of that is still in North America, but there is expected growth in India, Asia, and the Pacific as well.
Markowitz: Right now the market is defining itself and one of the issues related to this, which is not unlike speech recognition, is that when you look at these technologies, you really find that this is not something that you can say is stand alone. It goes with other kinds of things, so it is part of various other markets. When analyzing, analysts sometimes look at different kinds of things to figure out what the size and scope of the market are.
ST: How pervasive do you expect this technology to be?
Ziv: Speech is, by definition, the most comfortable form of communication, especially about important things. If you look at the types of interactions that are self-served, it is usually the mundane, such as going to an ATM; but if you want to have a discussion with your banker about where you should invest, then speech is still the predominant form and probably will continue to be for many years.
The two areas where speech technology is being addressed are for machines to be able to understand when people talk to them and for what can we learn from the interactions that people are having. Both of these are the bases of the current knowledge and current transfer of communication. There are so many ways that you can apply this. Anywhere that people are talking to each other, there is knowledge that could be used for business intelligence, for companies to understand their customers, for personal usage. Just like you have an application that allows you to search your emails, you might be able to search you phone calls or other things.
I don't know where this is going to go in the future, but the reality is that technology is here to allow you to use this spoken information. The information that lies there is very rich, very powerful; and it really just amounts to the imagination and the right way to apply it. What makes it really hard to scope this market is that the potential is so large that it is hard to put a boundary around it.
Konig: If companies would use this technology to become adaptive, dynamic, and essentially to customize themselves to every customer based on their needs and the optimal way to serve them, it could only increase satisfaction, loyalty, and the value of the company. All of us as consumers will benefit in our day-to-day life. The business case that this technology can bring to enterprises is its ability to be more responsive, more adapted to the customer, and better equipped to create positive cycles to support both consumers and enterprises.
Markowitz: There is a deployment of audio with video cameras on street corners of one town—possibly in Europe, in a danger area where officials are listening and watching for threatening communications by terrorists and the like. That is a totally different world of use that would affect everyday life. But, when we think about this technology, there isn't really a boundary. Every time you think about it, you can look at it in a different way. People are using it in such creative ways that if we have this conversation next year, it will perhaps be completely different.