Succeeding in the Battle Against Deepfakes
Voice deepfakes have become one of the largest risks facing modern businesses and their call centers, with attacks using this type of deception surging 3,000 percent in the past year alone, according to research by Onfido, a digital identity verification solutions provider.
Scams using deepfakes—the highly realistic digital audio or video forgeries created using artificial intelligence to analyze voice data, identify the unique patterns and characteristics of a target voice, and re-create a clone of that voice that can be used to say anything the scammers want—are targeting consumers and businesses alike. The average loss for organizations that succumbed to deepfake attacks was $450,000, while the average loss for financial services firms was more than $603,000, according to research from Regula, which provides identity verification solutions and forensic devices for automated ID document verification and biometric checks. One in 10 organizations reported losses exceeding $1 million, the firm found, underscoring just how severe the problem is.
Pindrop Security, a voice authentication and security systems provider, forecasts that deepfake fraud will continue to rise, posing a $5 billion risk to contact centers in the United States alone. The increasing sophistication of text-to-speech systems, combined with low-cost synthetic speech technology, presents ongoing challenges.
Small companies can be particularly vulnerable, because “everybody knows everybody,” according to Aaron Painter, CEO of Nametag, an online identity verification solutions provider. So a deepfake voice can relatively easily trick someone at the help desk into providing a password or other security credentials to a fraudster using a deepfake voice that claims to have misplaced or forgotten it, he adds.
Deepfake scams like this have been on the rise for years, but they have really benefited from recent advances in artificial intelligence in just the past year or so. With the newer technologies, scammers can now create convincing voice impersonations and use them to orchestrate multimillion-dollar frauds with very little effort.
In 2020, platforms like Descript required 20 minutes of content to produce an audio reproduction, while now only a few seconds of a podcast recording will be enough to do a job for a malicious actor, according to Painter. “If you’re trying to trick an advanced voice biometrics system, then you might need a higher quality. But you don’t necessarily always need to have very high quality to be successful in your attack.
“Deepfake technology, generative AI, and AI in general have given bad actors superpowers. They have new tools that allow them to perpetrate even more sophisticated frauds, probably with less effort than before,” Painter continues.
Srinivas Tummalapenta, an IBM distinguished engineer and chief technology officer of IBM Security Services, agrees. “Bad actors have a low barrier to entry,” he warns in a company blog. With just $5 and a minute-long voice sample, scammers can now impersonate CEOs, potentially wreaking havoc on company finances and reputations, he adds.
Despite the ability of deepfake technology to allow bad actors to conduct full conversations in cloned voices, most scam calls using the technology are actually far less sophisticated. Using a combination of traditional social engineering and synthetic voices, experienced scammers can navigate interactive voice response menus and steal basic account details.
Some deepfake voice recordings revealed that scam callers were using their own voicebots to mimic IVRs, according to Pindrop. Rather than attempting to answer any of the automated prompts, the caller was just repeating them back to the IVR.
Pindrop kept a record of these calls and discovered that following the initial mimicking, contact centers received similar calls, but this time, the caller repeated the prompts in the cloned voice of the IVR.
“The ultimate goal is to take over the account and cause harm,” Painter explains. “It’s used commonly in many kinds of social engineering attacks. That is the main reason why we can no longer trust what we see or hear through digital channels.”
And consumers can no longer rely on companies to protect them, and vice versa.
“A recent survey showed that 86 percent of call centers surveyed are concerned about the risk of deepfakes, with 66 percent lacking confidence that their organizations could identify them,” Painter says.
“The significant gap between confidence in detecting deepfakes and the reality of financial losses shows that many organizations are underprepared for the sophistication of these attacks,” said Ihar Kliashchou, chief technology officer of Regula, in a statement.
A major problem in combating the threat of audio and video deepfakes is that they, unlike email phishing, don’t come with red flags like spelling or grammatical errors or strange links, Painter says.
Follow the Clues
But that’s not to say that it’s completely impossible to identify deepfakes.
WellSaid Labs, an AI text-to-speech technology company, provides the following clues that could indicate that a voice sample is a deepfake:
- Unsettling silences. Elongated pauses are often the first warning sign of an audio deepfake. Scammers, veiled behind manipulated audio, must type to conjure the simulated voices. It’s a process that predictably generates delays and awkward halts in conversation. The stark unnaturalness of these pauses can be perceivable.
- Voices with an artificial timbre, strange accents, or that deviate from familiar speech patterns.
- Urgent emotional ambush. Be suspicious of unsolicited outreach whispering tales of loved ones in jeopardy or pressing for personal information, especially if money is involved. “Any communication that seems strangely misaligned with the character of the sender or that employs high-pressure, emotionally charged tactics should prompt you to pause, disconnect, and reestablish contact through trusted channels,” WellSaid advises.
- Visual glitches. Deepfakes, especially those of lesser quality, might inadvertently expose their illusion through a mosaic of visual glitches, such as isolated blurs, facial double edges, fluctuating video quality, or inconsistent background and lighting.
- Suspicious sources. Who approached whom? From where did they emerge? These are questions to mull over when unsolicited digital specters of persons or companies materialize. WellSaid recommends maintaining a fortified skepticism, particularly when encountering potentially polarizing content involving celebrities or politicians. Ensure you scrutinize and verify the true source before accepting digital interactions at face value.
- Distorted decibels. Deepfakes tend to have a hodgepodge of irregularities, such as choppy sentences, peculiar inflections, abnormal phrasing, or incongruent background noises, that can indicate their synthetic origins.
- Unnatural movement. Deepfakes often fail to correctly mimic the seamless flow of natural human movement. Anomalies such as erratic eye movements, infrequent blinking, and misaligned facial features or expressions often belie their believability. Additionally, deepfakes traditionally struggle with accurately rendering oral visuals.
- Inconsistent imagery. If an image looks suspicious, scrutinizing the finer details can reveal whether it’s a fake. Manipulated images often fumble with accurate portrayals of light and shadow and might present peculiarities like distorted hands or inconsistent skin tones and textures across different body parts.
- Low-quality video. Deepfakes, limited by their frame-by-frame generation, often employ minimal facial movements. Typically, they opt for predominantly front-facing visuals to maintain their illusion. They also tend to be lower quality to mask imperfections and artificially generated components. Try switching to the highest available resolution and viewing on a large screen.
Experts agree that companies need to use AI as well as other technology to fight the threat of deepfakes. A whole crop of technology vendors now exclusively carry deepfake detection technologies.
AI to the Rescue
“The problem is that it’s an arms race or a cat-and-mouse game. It’s AI vs. AI. Someone is always going to be slightly ahead. And more often today, it’s the bad actor who is one step ahead. The bad actor is using slightly better AI technology than the detector. It’s an arms race that somebody will lose, and it’s most likely to be the company,” Painter says.
So experts, including some working for companies providing deepfake detection technology, recommend using multiple defenses to improve the chances of fighting deepfake threats.
One of the leading technologies is liveness detection capable of deciding whether it’s really a live person or a spoof—like a replay attack as opposed to a live voice.
“Detecting playback spoofing attacks in speaker verification systems can be a big challenge, but liveness detection can identify these with a high level of accuracy through multiple methods, including intrasession voice variation,” says Mohamed Lazzouni, chief technology officer of Aware, an identity verification and biometrics provider.
With liveness detection, the user speaks a phrase to get verified, with the liveness detection system capturing an audio sample from that phrase. The speaker is then prompted to repeat a random part of the phrase and the system compares the received samples to provide a liveness detection score.
For deepfakes that have video and audio components, the first line of defense employs dynamic verification, moving beyond static passport-style photos to real-time interaction requirements, says Titus Capilnean, vice president of go-to-market at Civic Technologies, a provider of identity management tools for web3. With dynamic verification technology, companies can ask customers to perform specific actions during verification, such as adjusting camera positioning or following movement prompts. This active authentication makes it substantially more difficult for fraudsters to deploy prerecorded or AI-generated content, he says.
“Source validation forms the second critical layer, ensuring explicit consent and authentic identity claims,” Capilnean adds. “This prevents unauthorized impersonation, even if a deepfake appears convincing. Someone creating a highly realistic deepfake of a public figure couldn’t bypass security without possessing the actual individual’s verified credentials and explicit permission.”
The final component of liveness detection involves immutable blockchain records that create tamper-proof links between verified identities and their digital assets. Once authentication is complete, blockchain technology ensures this verification can’t be duplicated or manipulated for unauthorized use.
Rex Johnson, executive director and practice leader of cybersecurity consulting services at CAI, recommends using additional technologies, including the following defenses, to thwart deepfakes:
- Digital watermarking. Embedding unique digital signatures into original content can help verify the content’s authenticity. This method is effective for tracing and validating images and videos.
- Metadata examination. Analyzing metadata can reveal that content was manipulated, helping to flag potential deepfakes.
- Blockchain technology. This provides a timestamp that records the history and ownership of files on a permanent archive, making it easy to track changes.
- Real-time verification systems. Technologies such as multifactor authentication, biometric verification, and blockchain can help confirm the legitimacy of communications and transactions.
Painter also supports a multipronged approach. “There are things that are proven, besides AI, like cryptography, which has been around for years, biometrics, and other things that can give you a more holistic way of preventing a deepfake attack,” he says. “Part of our technology uses mobile phones for the user to take a selfie. By doing that on a mobile phone, you get to benefit from the cryptography on the mobile phone, the biometrics on the mobile phone, and the AI on the mobile phone, in a way that previously didn’t exist.”
Banks are starting to do this, sending codes to mobile devices that customers have registered with them in advance.
It’s a promising sign, given that most deepfake scams have a financial motivation.
But for banks and all other companies, defending against deepfakes requires more than just technology, experts agree. Education and awareness remain critical components of an effective defense strategy. “We’re educating people,” IBM’s Tummalapenta says. “The more attacks they see, the more they understand the threat.”
A multilayered system particularly benefits platforms handling sensitive transactions or requiring high-security verification, Capilnean says. “By implementing these advanced identity checks, organizations can significantly reduce the risk of deepfake-based fraud while maintaining user privacy.”
Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises1@comcast.net.
5 Companies That Matter
- Pindrop Security. A provider of contact center security solutions, offering secure authentication, fraud detection, and deepfake detection.
- Behavioral Signals. A developer of voice analysis technology to analyze human behavior from voice data to decipher emotions, anticipate intent, and detect threats.
- Reality Defender. A provider of a platform to proactively detect AI-generated media and manipulated content across audio, video, images, and text.
- Resemble.AI. A provider of technology for AI voice cloning, text-to-speech, and speech-to-speech conversions.
- Google DeepMind. Google’s artificial intelligence research lab.