Speech Recognition Puts Content into Context

Typically, ads associated with online media book-end a clip and are not always topically relevant. Digitalsmiths Corp. is looking to change that, and announced yesterday that it had successfully demoed VideoSense, the first fully automated contextual video ad targeting solution that works on multiple platforms, including traditional broadcast, broadband, and mobile.

Digitalsmiths offers an alternative to other search engines that latch onto keywords to determine relevancy, which can at times be imprecise and neglectful of context. A tag like "baby doll" might refer to a line of toys, a line of clothing, or an adult website; it would be unwise for ads catering to one industry to cross over to the other.

Using proprietary technology, VideoSense combines traditional Web targeting techniques with video image recognition and speech recognition to pinpoint subjects that might grab the user’s attention. The system accomplishes this by analyzing the actual content of the video, using filters to perform an audio recognition that includes ambient sound, music cues, recognition between day and night, and speech. VideoSense then provides targeted metadata to optimize customization of an ad. Ultimately, this allows for real-time analysis and optimizes advertising potential.

"The one certainty of the future of digital media is that new ad formats and content platforms will continue to evolve," said DigitalSmiths’ CEO Ben Weinberger in a recent press release. "As a result, VideoSense has evolved to be the most comprehensive contextualization solution to address digital video."

"As multimodal search technologies improve, entrepreneurs are experimenting with a greater range of applications for them," says Datamonitor analyst Ri Pierce-Grove. For instance, while Interactive Voice and Video Response (IVVR) systems are already hot in Europe and Asia, AT&T’s suite of 3G phones just brought the technology stateside.

Other industry segments, such as streaming media, are also beginning to affect the speech technology industry. And with the proliferation of online video, a multimodal approach to search and data mining seems not only natural, but necessary. "It represents market efforts in bringing those technologies to bear on a high profile challenge: monetizing online video content," says Pierce-Grove.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speech Recognition Puts Content into Context

Hona Launches Voice AI

AI Virtual Assistants Market to Hit $2.45 Billion by 2030

SoundHound AI Delivers Voice Assistants at Scale with NVIDIA

Kardome Mobility Now Available on NVIDIA AGX Platform