Still think those synthetic AI voices are a fantastic idea for your media production? Maybe you’d better read this.
This goes out to anyone whose job could potentially involve creating a multimedia production – typically a video.
Maybe you’re in corporate communications, marketing or brand management. Perhaps your game is producing e-learning modules or employee training. Or spearheading a fundraising campaign for your non-profit.
Whatever it is, you may be asked to craft something unique that might serve as the public image of your brand, your company, your mission. Even if it’s only seen and heard internally, the goal is no different: a piece that resonates with your audience and effectively conveys your message.
But you’re also on a budget and facing a looming deadline.
That’s where artificial intelligence – AI – usually comes in. AI is often hyped as a cutting-edge technological marvel, the glorious magic bullet that’ll solve all your problems. Your ticket to getting stuff done quick, cheap and easy.
AI has found its way into the media production world, bringing efficiencies in areas like graphic design, script writing and streamlining the video editing process.
This includes the use of text-to-speech programs that generate synthetic voices for narration, thereby – allegedly – saving money, time and effort otherwise spent on hiring a real, live human to do that voice over.
Perfect solution, right? WRONG. Here’s why: there’s mounting evidence that people really hate AI voices.
A Tell-tale Leading Indicator: Look to the Video Game Community
A recent article in video game news/reviews website The Gamer pulled no punches on the AI voice over issue.
I’m not big into video games – nothing against them, I just don’t really play. However, I am fascinated by the chatter and buzz among gamers. As you tune into their conversations, you discover they’re an opinionated bunch. They’re also quite savvy, well-informed, articulate and passionate about their hobby.
These people take gaming seriously. They sweat the details. They demand an immersive, realistic experience, and they’re not shy about expressing pointed criticisms when things don’t meet their expectations.
The commentary can get very granular – such as their focus on audio quality, and specifically, yep – the voice overs. Apparently, that’s a touchy subject.
For proof I point you to a recent minor uproar over a new offering called Destiny: Rising. As of this writing, it’s in the “closed Alpha” stage, which is techno slang meaning the game is still in development, but has been released in semi-finished form to an exclusive, limited group of testers to solicit their feedback.
Almost immediately users began reacting to the game developer’s heavy reliance on AI-voiced characters. Word spread fast, thanks in no small part to this article titled “I’m Horrified by Destiny: Rising’s AI Voice Overs and You Should Be Too.” (No ambiguity there!) The author emphatically declares there is “no place for AI voice acting” in this, or any video game:
It’s incredibly jarring the first time you hear it. After a couple of encounters with AI-voiced characters that felt off, I eventually met Ikora, a classic Destiny character, and was subjected to a truly abysmal AI ‘performance,’ and it made me want to instantly uninstall and never look back…worst of all the AI dialogue is mixed in with actual human VO. Ikora will start out talking like a normal person, and then after a few lines, swap to the lifeless drone of a robot making sounds that mimic human speech.
One review on YouTube also singled out the AI voices, saying “…and then in the middle of the conversation, it will change over to a GOD-AWFUL AI robot voice which is both extremely noticeable and takes me out of the moment each and every time.”
Another gamer’s damning judgment was that it has “some of the worst writing and voice acting I’ve ever seen,” including the AI-voiced characters. “It’s like literally speaking into a Speak ‘n Spell.”
There was enough blowback that the company released an official statement assuring the gaming community that the AI voices are only intended as temporary “place holders,” with a promise that those lines will eventually be recorded by actual voice actors.
One of the YouTube reviewers expressed relief: “…and man, thank God – because that shit is really annoying and weird and distracting.”
Gamers crave authenticity. That voice inside the game needs to hook them and keep them emotionally invested. As for the AI voice? Pardon the pun, but the players…they don’t play dat.
Others are Noticing…And They Don’t Like It Either
It would be easy to write off these gamers and their obsession with the sensory experience as hyper-sensitive, nitpicky outliers who don’t accurately represent the general public. Yet they’re not alone.
If you spend any time wandering the internet, you’ll soon spot others venting their irritation with these creepy, emotionless AI voices. YouTubers with entire videos voiced by AI are routinely crucified in the comments sections. People mock it on TikTok. I’ve read Reddit forums where posters say as soon as they hear an AI voice in a video, they click off it and move on.
Anecdotally, I’ve heard from other voice actors – and a few video producers – that a fair number of media production pros favor the temporary “place holder” strategy. They do utilize the AI voice, but only as a tool – a raw first draft they fully intend on scrapping once they can hire a flesh and blood voice actor to infuse the emotion and tone they need.
Meanwhile, I’m both amused and disgusted watching the ongoing glut of videos circulating online that review the latest and supposedly improved versions of some of these text-to-speech AI voice generators. They rave in amazement at how remarkably “lifelike” they sound, they get giddy over the wide variety of choices in voice types, and insist you can’t distinguish them from a real human.
And then they demo the product.
Several of the voices are clearly robotic and flat. The enunciation is occasionally muddled and it mispronounces certain words, often with incorrect accenting – such as saying “con-TENT instead of CON-tent.” Sentences crash into one another illogically, without the natural pauses, inflections and shifts in timbre and pacing you’d expect.
After a few minutes, you can’t help but detect how one-dimensional, monotonous and sterile it all sounds.
I’m not sure if these reviewers can’t separate their infatuation with the technology from its glaring flaws, or if perhaps they could be paid shills, but it feels like they’re hearing what they want to hear, despite the flubs and hiccups.
Even a series of manual controls allowing you to adjust variables like pitch and speed do little, if anything to improve upon this soulless caricature. Worst of all: there are now drop-down menus allegedly allowing you to “select” an emotion. You can apply options like angry, sad, excited, curious.
Here’s the problem:
Angry – doesn’t sound angry.
Sad – doesn’t feel sad.
Excited – lacks genuine excitement. Actually, it borders on unnervingly manic.
Curious – sounds like it doesn’t give a shit.
How the hell can you pigeonhole an emotion into a drop-down menu? Predictably, it comes off as hopelessly clunky and primitive.
Here’s an emotional option that jumps to mind: EMBARRASSMENT.
Trying to digitally capture and bottle all the subtleties and intricacies of the human voice and the myriad ways it can transmit something as intimate as emotions is a fool’s errand.
Go Ahead – Alienate Your Audience. Undermine Your Message. Sabotage Your Brand.
Still want to go the AI route? Be my guest. You might get away with inserting that synthetic voice in a project or two, and if you’re lucky no one will be the wiser. Maybe it works in limited doses, and maybe it doesn’t. Maybe you save a few bucks and your boss says it’s passable – good enough.
But will “good enough” be good enough to your intended recipients?
If you’re doing anything that requires even an ounce of authenticity, nuance or sincerity, you’ll be in trouble. If you need to tell a compelling story, sell a high-end concept or product, raise money, hammer home an urgent message or hold your audience’s rapt attention for more than a hot minute, AI will eventually screw you.
While you’re patting yourself on the back for your ingenuity in cutting corners, people, perhaps unconsciously at first, will pick up on the deception. They’ll sense something’s off, and that somehow you stripped all the humanity and connection out of their experience. They’ll feel that you insulted their intelligence. That you decided they didn’t rate high enough for a real person to engage them.
They’ll resent that.
Now, honestly – is doing that to your customers or clients a risk you want to take?
May I make a humble suggestion? Heed the growing backlash.