Attack of the Smart Bots
In my previous article I suggested a variation of Moore’s law for speech technology and artificial intelligence (AI): Every seven years, they both double in capability, accuracy, and, in the case of artificial intelligence, the scope of interaction. That means that in the span of 20 years, speech technologies could be eight times as powerful. That’s daunting, exciting, and a bit scary.
But I didn’t discuss text-to-speech (TTS). There’s a lot to think about when it comes to improved TTS, and I realized that I’d neglected the possible side effects as I read a recent blog post by Bruce Schneier, a highly regarded cryptographer and security expert who now describes himself as a “public interest technologist.” He teaches at Harvard and contributes to the public interest through work with organizations such as the Electronic Freedom Foundation and VerifiedVoting.org.
Envisioning a text-based system, Schneier noted in his post that in the near future artificial intelligence will likely gain the ability to mimic humans while adhering to the norms of an affinity group. For example, a bot may join a group devoted to knitting. The bot’s creator will train it to interact with other knitters; at the same time, he will imbue the bot with a particular political viewpoint. In a human-normal offhand way, the bot will from time to time espouse that view to fellow knitters—and thereby influence them. The normalcy of the bot’s usual interactions, combined with its independence from other bots and its idiosyncratic dialogues, the ability to be creative, and the bot’s individual decision about when to make a political comment, will make detection of these bots quite difficult.
As I read Schneier’s article I realized a nefarious role TTS could play in the bots of the future.
As I write this, Meta/Facebook shares are plummeting while Twitter just changed management (and started bleeding advertisers). Regardless, based on the past 20 years and the rise and fall of various services, I do believe that something like YouTube, TikTok, Twitter, and Facebook will likely exist for some time as individuals, groups, businesses, and governments find them handy, helpful, lucrative, essential, or all of the above. I note these platforms because at present they all use audio that contains speech, Facebook less so than the others in my highly limited experience.
As these or their successor platforms have a diverse audience, millions of users, and zero or very low operating costs to users, we can expect they will remain the target of bots, eventually bots equipped with speech.
These automated systems interact with a target platform to influence human users (and, at a guess, the individuals and institutions that monitor the platforms to determine the population’s sentiments). Bots can promote political ideas, consumer products, and social causes; as I write this, I suspect an entire army of bots are at work supporting Russia’s invasion of Ukraine. And bot makers have a deep ecosystem; for example, while looking up the bot problem on TikTok—I do not use that platform for security reasons—I immediately came across a project on Github: an open-source bot that apparently inflates the status of a video.
Why bots? Certainly bots provide a method for an individual to amplify his opinion or endorsement; an organization can afford the means for substantial amplification, a government even more so. And consider the drawbacks of ordinary advertising. An ad is clearly such; a social media posting that’s apparently from an individual seems more genuine. An advertiser must research, create an ad, tune it for the audience, and then pay to distribute it. An ad can be reused, but in time it must be replaced with another elaborate ad. A bot requires training and tuning (which should become less expensive with time), but once a bot is created, duplicates ought to be cheaper and easier to create with experience. And AI bots create their own content; that’s part of the point—many different social media posts that all support the same idea. The advantage to the unscrupulous bot user is that he can unleash his bots on the target platform, disseminate his message surreptitiously and therefore effectively, and not have to pay anyone for the privilege! And, of course, if done well, the bot’s comments are presumed to be more genuine.
Now imagine a bot that uses images, video clips, cartoons, screenshots, and the like to create narrated videos posted on a social media site. Perhaps the videos are usually about knitting (“Here’s some exciting new colors I like”) but every once in a while the message is different (“I hate to say it, but John Doe really isn’t qualified to be dog catcher”). With sufficiently advanced technology, the bot’s presentation might be even more persuasive than a text-based message. That kind of persuasive power coupled with a hidden agenda will pose interesting challenges indeed.
Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.