Close this search box.

Artificial Intelligence Replacing Voiceovers? Not So Smart

AI: Worrying VOs

Every so often, a hot topic sparks passionate debate in today’s highly interconnected voice acting community.

I’ve noticed just such a discussion lately echoing through private Facebook groups, podcasts and even the major VO conferences.  It concerns the use of artificial intelligence (AI) to create synthetic but realistic-sounding voices to narrate a project, instead of hiring a live human being.

This technology is known as a text to speech voice generator or an AI voice generator, and several companies have emerged with some rather impressive offerings.  You’d swear it’s an actual person talking.  Imagine that pleasant, perfectly enunciating male voice belongs to a sophisticated gentleman behind the microphone.  There are female voices too, with just enough texture and maybe even a hint of sexy to suggest she must be a beautiful, educated woman – the kind with whom you might like to engage in stimulating conversation over your favorite bottle of wine.

But guess what:  they’re not real.

Some don’t see this as a problem, but as a solution. These gadgets are marketed to those who want a voice on the cheap. You buy the software once and learn to configure it so when you feed it your text, it speaks it back to you.  Use it repeatedly and save big bucks you otherwise would spend on a professional voiceover talent.

And that’s precisely what has some of my peers in a twist.


Frankly, We’ve Been Here Before

I’ve seen the commentary, ranging from mild concern to sheer panic:

Will AI impact our profession?

How can we expect to compete?

Is this the end of the voiceover industry?

Should we lobby lawmakers to stop this?

Sorry, but this all sounds very familiar to me.  Similar questions and howls of despair were heard in other segments of the economy when new technology arrived on the scene.

I used to work in radio.  Back in the early 2000s, unemployed deejays seethed over the growing practice of automated “voice tracking” utilized by Clear Channel (derisively nicknamed “Cheap Channel” by many a radio employee cursing it under their breath) and other big radio conglomerates; these companies farmed out to a handful of individuals the task of pre-recording digital audio files of all the DJ chit-chat between songs.  Those files got dropped into a software program, where they could be sandwiched between similarly digitized versions of the songs and commercials, so management didn’t have to hire an hourly employee to sit there in person and spin records all day.  A computer did it all.  Instant radio!

Sometimes they directed staff to package generic audio content for radio stations in multiple cities. “Stacking” hours of carefully curated song playlists and voice tracks streamlined operations and budgets, but it also killed on-air jobs.  Suddenly your friendly local disc jockey was obsolete, and you were unwittingly listening not to a live local broadcast, but a pre-produced feed piped in from an unknown location.

Perhaps you’ve read about how automation could eliminate factory workers on the assembly line, or how hi-tech ordering kiosks and apps might threaten entry-level jobs at the counter of your favorite fast-food joint.  Maybe there’s some validity to that.  I don’t know; I’m not an expert.

However, with respect to AI and the voiceover business, I have some thoughts.


To My Peers:  My Two Cents…

Please, chill out.

I get it. Some VO talents are radio refugees who fear their livelihood is again about to be ripped away by yet another technological abomination.

Let’s just face this head-on:  are there bargain basement do-it-yourself types inclined to purchase these AI products as a cheaper alternative to hiring voice actors?  Don’t be afraid – say it.  YES.  Fine.  I say best of luck to them.


…but Something for Potential AI Clients to Consider

Let’s also speak some truths here.  Go back and listen again – carefully – to that pleasant sounding but artificial man.  Spend a few more moments with that alluring but digitally engineered woman.  Something’s off, right?  Something missing? Like inflections?  Nuance?  Emotion?  Personality?

Yeah, cancel that bottle of wine – she ain’t interested in you.

Candidly, a text to speech program could work in some scenarios, and if it does, fantastic. I suppose there are those scripts that are so transactional and dry in nature that you can get away with it.

But if you’re at all concerned about engaging a listener, conveying deep feelings or making a connection, then I’d suggest AI is a poor choice.  It’s one-dimensional and detached. People will pick up on that glaring lack of essential humanity.  It will come off as tacky.  Like wearing white athletic socks with dress shoes.

And they’ll get that uneasy, insulting feeling that they’ve been had.

That’s bad news for you if you’re fundraising, or promoting a high-end brand, or trying to impart critical information, or swaying someone to your point of view. Some situations demand authenticity, trust and an emotional hook.  AI is not going to get you there.

The marketers of AI voice simulation are so preoccupied patting themselves on the back for successfully producing realistic-sounding vocals, they fall into the same trap as many amateurs looking to hang a shingle in the voiceover world:  believing that all you need to succeed is a great set of pipes.  Don’t get me wrong, being gifted with a resonant, nice sounding voice is an advantage, but it’s only part of the equation. It’s about more than the sound.  It’s what you do with that sound.

At its very best, the human voice is so much more.  It’s a wonderful, versatile instrument capable of conveying the full range of human emotions as well as the most delicate shades of nuance. Consider these:

Unbridled joy.

Inconsolable grief.










All of these are expressed through our unique voice, with an assist by every fiber of our physical being.

Depending upon the project, a client may need to direct the talent to ratchet up the emotion, or dial it down, or channel a certain attitude, mood or tone.

Try that with AI.  You simply can’t program that.  And frankly, why would you want to try?

The phrase “you get what you pay for” seems appropriate here.  Sometimes all the cost-cutting, automation and efficiency in the world is no substitute for the real thing – the human touch.

Can you afford insincerity?