April 1, 2024
By Leonard Klie Editor, Speech Technology and CRM magazines
Editor's Letter

The Voice Fraud Threat Is Real

My late father fell victim to a voice deepfake scam a few years ago. It was the all-too-common grandparent scam, in which a fraudster calls claiming to be a grandchild in trouble. In this case, it was purportedly my niece claiming that she was arrested for being in the car with a friend who had drugs in his possession. She needed $7,500 for bail, and it had to be sent in cash to an address in Miami (even though the arrest was supposed to have happened in New York). My father fell for it because the caller used a deepfake of my niece’s voice to lend the scam some credibility.

Once my brother and I heard about it, we stepped in and took steps to put a stop on the cash shipment and have it returned to New York. We also called the Miami-Dade Police Department, which had an officer sit on the location until the fraudster showed up to claim his prize.

My father was lucky that we stepped in when we did, but not everyone is that fortunate. Fraud like this happens more often than most people know. And as speech technologies make it easier to create voice clones with incredible realism, instances like this have been on the rise.

A year or so after the call with my niece’s voice, my father got another call with a similar plotline. This time it was from the contractor my father hired to remodel the bathroom in his house. And I received a call a few years after that from a former colleague in a similar predicament. “Nice try, jackass!” I told the caller, and I never heard from him again.

And then there is the now-infamous case of an employee transferring $25 million in corporate funds to five bank accounts after a virtual meeting with what turned out to be audio-video deepfakes of senior management.

All this proves a point made in our second feature, “Safety and Ethical Concerns Loom Large in Voice Cloning.” In the article, Bob Long, president of the Americas at Daon, a biometrics and identity assurance software company, warns, “Everyone needs to be concerned about the technology being used for nefarious purposes. It’s no longer where you’re looking at just famous people; it’s everyone.”

So where do the callers get the voice samples they use to create their deepfakes? In my niece’s case, we’re pretty sure it came from a video she had posted on social media, but these fraudsters also record phone calls or hack into files that companies use to store their customers’ and employees’ voice prints for biometric authentication. Since many of the technologies today only require a few seconds of audio to re-create real people’s voices, the samples are relatively easy to get.

I should point out that not all voice cloning uses have a malign purpose. Indeed there are far more cases where the technology is being used for good. Dubbing, translations, intelligent voice assistants, screen readers and other assistive technologies, toys, robotics, video games, learning assistants, and so many other use cases abound. The technology has even been used to preserve the voices of people who are at risk of losing the ability to speak due to various diseases and to preserve dying languages.

The speech industry is going to great lengths to develop technologies that can detect, prevent, counter, and mitigate the harm caused by nefarious deepfakes. Measures like liveness detection, watermarking, advanced analytics, and voice scrambling have helped. So have technologies designed to safeguard voice file databases from unauthorized access, but more can and should be done.

Rapid advancements in artificial intelligence bring new challenges to the speech technology industry, and you should expect this trend to continue. “Fortunately, we are also working hard to research and develop new ways to leverage AI to its full potential to counter AI-powered fraud,” Alexey Khitrov, CEO and cofounder of ID R&D, assures us in the article.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Voice Fraud Threat Is Real

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

XL8 Delivers Real-Time Spanish Translation Captions to U.S. Public Broadcasters

Northeastern Researchers Develop AI App to Help Speech-Impaired