Deepfakes Will Remain a Real Concern
In October 2001, I moderated a SpeechTEK Conference panel in New York titled “Trends in Speech Technology.” This was one month after 9/11, so it was still pretty fresh. Pictures of missing persons were still wallpapered on the windows and walls of buildings. Pedestrians still stood silently at the sound of sirens as emergency vehicles passed by.
For the panelists, my most pressing questions involved what we now call “deepfakes,” although that term did not yet exist. I was interested in exploring the potential dangers of speech technology’s evolution, which would continue to a point where the technology became so real-sounding that it would be impossible to discern whether the audio we were hearing was from the individual it purported to be or a made-up product of the technology.
Deepfakes refer to the use of artificial intelligence to create or manipulate media, including videos and audio recordings. They’re not necessarily new, and the creators don’t necessarily have malicious intent. They are, however, becoming more sophisticated and difficult to detect.
Back then, the concern would have been a deepfake of Saddam Hussein spouting rhetoric resulting in some major conflict. Today, the players are different, but the danger is even greater, in part because of advances in the technology.
We already had machine learning in 2012, points out CJ, an IT professional who has a background in coding. But we had tools to manipulate audio, photos, and videos long before then. Before we had modeling to quickly generate content, users could edit images in Photoshop. And prior to that, one could set up a camera at just the right angle and make a martini glass look bigger than the human holding it.
Manipulating data used to be reserved for a small subset of the population because exploiting the technology was expensive and required a specific skill set. Now, the technology is easily available to users just by navigating to a website.
Maybe someone digitally manipulates his image to remove red eye from a photo or to make her face look slimmer. But what if the same technique using deep learning models is used in journalism? It’s one thing to edit out strangers on a crowded beach from a picture of your trip to Hawaii, but entirely another to add or remove people from an image being used in a news story.
Like pretty much all technology, it’s a slippery slope. The question is, for what purpose is the technology being used? Does the creator have malicious intent?
“The technical detection of the manipulation is similar; we look at pixel distribution, artifacts,” says Gang Wang, an engineering professor at Illinois Grainger. “It’s the intent behind the manipulation that’s much harder to identify.”
If we can determine the context of why the manipulation happened—the intent—and define misuse, we can create policy, although that may trigger legal issues.
Earlier this year, in what CJ calls “the logical progression of the Nigerian prince,” a British company forked over $25 million after scammers, in a very sophisticated operation, convinced an employee to get on a Zoom call, which had deepfakes of the company’s CFO and other employees on the call.
As the technology behind deepfakes continues to improve, it would be wishful thinking to believe that bad actors will stop exploiting it.
“It’s been two years since ChatGPT came on the scene, and it’s improved exponentially since then,” says Chris Winder, an IT specialist and a sci-fi author. The chatbot reached 100 million monthly active users faster than TikTok, and we now have commercials built completely by AI.
CJ, Wang, and Winder agree there’s no going back from the technology. “Programming machine learning is 99 percent open source,” says CJ. “Even if companies could stop tomorrow, all it takes is one company that does not care about ethics to pick up where the others left off.”
To avoid getting caught up in deepfake misinformation, it’s wise to view social media platforms and even the news with skepticism, and to research the given topic before forming an opinion. Additionally, the 24-hour rule—refraining from reacting for 24 hours—allows time for mis- or disinformation to be corrected.
James Madison said, “Some degree of abuse is inseparable from the proper use of everything.” The trick here is finding the balance among policy, legal, ethical, and privacy issues to minimize the abuse while still embracing individual sovereignty, which I’ll discuss in a subsequent column. x
Robin Springer is an attorney and the president of Computer Talk, a consulting firm specializing in implementation of speech recognition technology and services, with a commitment to shifting the paradigm of disability through awareness and education. She can be reached at (888) 999-9161 or contactus@comptalk.com.
Related Articles
Users can interact with objects in a safe environment.
26 Apr 2024
Don't assume all users will be able to follow design updates.
06 Nov 2023
Dysarthric speech is getting more equal representation in voice algorithms.
25 Jul 2023