- Published Aug 29, 2013 in The Biz
- Read time: about 4 minutes
Dave Courvoisier looks at the emerging voice synthesis technology and what it means for future voice-acting jobs.
When I walked out of a movie theatre in the Summer of 1968, having just seen "2001, A Space Odyssey", I had lots of questions. Sci-Fi was my genre, and yet, here was a movie that just befuddled my 16-yr-old Illinois farm-kid brain.
No matter, really, because I was a "Hal 9000" fan evermore. The soft, comforting tones of the spaceship's onboard computer (voiced by Douglas Rain) stole the show for me. Unbelievable! A computer that sounds more human than most humans!
Well, 2001 has come and gone...heck, even George Orwell's 1984 is now a distant memory, and no plot-lines from those landmark stories have come true (ACLU claims notwithstanding). Likewise, no computer-generated voice like Hal's exists either. Or...does it?
Visit the Loquendo.com website someday, and be amazed by the emotion, phrasing, and endearingly human qualities of their sample synthetic voices. Most who visit say they've never heard a computer-generated voice sound that good. To a lesser extent, Lessatech.com is also experimenting with speech that approximates the intonation and pacing of the human voice.
Something to do with universal access.
None of this should come as a surprise to anyone who's watching recent trends in technology. Do a Twitter search for the word "voiceover" and you'll see a healthy number of tweets discussing the new functionality Apple has designed into its software, especially as a feature of the iPhone, called, appropriately: voiceover. People who are visually-impaired love the program, because it reads words on the screen out loud. Apple calls it a "spoken English interface".
Amazon made similar capabilities available with the release of its second-generation Kindle electronic book. Virtually any content held in memory on the device can be read out loud by software called "Read to Me".
Granted, the quality of the sound is quite mechanical, anonymous, flat, unemotional, and plodding...but it's understandable...and don't think for a minute the software designers and engineers working in this area are just going to leave it where it sits.
In fact, drilling down into this issue forces to the surface an intersection of creativity, marketing, and price-point that voice-actors can't afford to ignore in the long term.
Rise of the machines.
The innovation that produces a human-sounding voice from a computer - with all the lilts, nuances, and timing that makes it genuine - requires perhaps as much technical artistry from the software engineer as it does experience from the voice-actor. What market forces will eventually force some clients and vendors to choose the "fake" voice over the human voice?
Some believe the writing is on the wall. One observer told me that if the price point comes down by half or if the quality goes up by another 20% — or even if the application to convert text to voice becomes easier (this saving time and money for the 'producer') — then voice acting price arbitrage will open up to synthesized voices for sure.
Most voice-actors feel that the encroachment of synthetic voices will hit the industrial/corporate market first, and that Audiobook publishers will be the longer hold-outs. Professional voice over artist Peter Drew said it this way: "Why pay someone to read an already dull script to accompany a human resources video on the latest changes to the company's benefits package? With hi-def hand-held cameras and desktop video production, many small retailers can make their own videos or contract a local agency to crank out a video for little money, saving even more by using a 'voice in a box'."
Quite a strong contingent of audiobook listeners and narrators believe a computer-voice will never replace a human one for capturing the delicate spirit of the spoken word. Many point out that listeners of long-form narrations don't brook acceptance of compression or other forms of audio processing, citing "ear fatigue". Wouldn't a long-form synthesized voice hold similar challenges?
Analogies to the music industry might serve this discussion. Take for instance the many imitation sounds engineered into some electronic keyboards today. The audiophile can discern the difference, but most average music listeners can't, and don't much care...if it means a less-expensive download for their iPod. Vegas' stage shows used to all have live orchestras, now most musicians have a hard time finding work on the Strip. In fact, the electronic equivalent of human-generated music gained a foothold as a genre and a market all its own many years ago.
So, could a computer-synthesized voice approach the anguish in the voice of a jilted lover, or a woman giving birth? The jury's still out, but technical innovators tinkering with ever more sophisticated mathematical formulas, and honing artificial intelligence programs to a sharp edge will keep trying. My guess is we'll hear synthetic voices that rise to the level of chess-playing software in their ability to innovate, learn, and approximate human nuances. That's when market forces will determine whether it's worth the cost to customers.
In the meantime, whenever I hear someone say my name: "Dave", I hear echoes of Hal, and wonder if it's a real human, or just a computer-gone-wild.