Making Computers Talk

HeXp£Øi±

Well-Known Member
Say good-bye to stilted electronic chatter: new synthetic-speech systems sound authentically human, and they can respond in real time
By Andy Aaron, Ellen Eide and John F. Pitrelli
Dial up a bank or airline these days, and chances are your call will be answered by a prerecorded voice rather than a live human being. By stringing together several canned phrases, such systems do an adequate job of bringing a banking or ticket-booking transaction to a successful conclusion. Though the cobbled-together speech sounds stilted, these systems are sufficient for handling limited transactions, where the subject matter is known in advance. But since they can't stray from their prerecorded phrases, their capabilities are limited.

Synthetic-speech researchers at IBM have been tackling a much tougher challenge: making computers say anything a live person could say, and in a voice that sounds natural. (Hear a sample by clicking here.) For example, we've developed systems that can "read" a breaking news story or a bunch of e-mail messages aloud over the phone. Like the current phrase-splicing systems, our newest ones, called Supervoices, are also based on recordings of a human speaker and they can respond in real time. But the difference is that they can utter anything at all--including natural-sounding words the original speaker never said.

What are the immediate uses of this technology? They include delivery of up-to-the-minute news, reading machines for the handicapped, automotive voice controls and retrieving e-mail over the phone--or any system where the vocabulary is large, the content changes frequently or unpredictably, and a visual display isn't practical. In the future Supervoices could enhance video and computer games, handheld devices and even motion-picture production. IBM released the latest generation of the technology for commercial use in late 2002.

Talk to Me

Scientists have attempted to simulate human speech since the late 1700s, when Wolfgang von Kempelen built a "Speaking Machine" that used an elaborate series of bellows, reeds, whistles and resonant chambers to produce rudimentary words. By the 1970s digital computing enabled the first generation of modern text-to-speech systems to reach fairly wide use. Makers of these systems attempted to model the entire speech production process directly, using a relatively small number of parameters. The result was intelligible, though somewhat robotic-sounding, speech. The advent of faster computers and inexpensive data storage in the late 1990s made today's most advanced synthetic speech possible. It is based on the premise that speech is composed of a finite number of linguistic building blocks called phonemes and that these can be arranged in new sequences to create any word. Therefore, a set of recordings of a speaker uttering all these building blocks can serve as a kind of typesetter's case for assembling speech.

Supervoices use this building-block model. While most of us think of language in terms of letters or words, the software treats it as a series of phonemes. English contains about 40 unique phonemes. For example, the word "please" is composed of four: P, L, EE and Z. Supervoice contains a collection of recorded samples of each phoneme. When it comes time to speak, the software grabs the appropriate samples needed to piece together new words.

Speech synthesis starts with a human voice, so our team typically auditions dozens of speakers to find the right one for a given task. We usually look for someone with an agreeable voice and who has good, clear pronunciation that is also free of any significant regional accent; at times, however, we may need other characteristics for a specialized application, such as synthesizing English with a foreign inflection or for a robot voice in a movie. The speaker who lands the part sits in a sound booth and reads several thousand sentences, which take more than a week to record. The sentences are chosen for their diverse phonetic content, to ensure that we capture lots of examples of all the English phonemes in many different contexts. The result is a collection of several thousand voice files.
Continued...
 
Always been interested in this because it really is a complex process. Getting better but you can still tell it's computer generated.
 
I'm still waiting. I've used Microsoft Agent from time to time, basically when I can remove all other noise from the house, which is not very often these days. I think it will be here eventually where you can come home, tell you computer to check your mail, respond to what needs to be responded to, and even post a few messages here without touching your keyboard.
 
Agent lets you do alot, but it involves alot of "training", basically alot of time spent reading sentence after sentence so it learns the way you speak. Then if someone else gets on the computer, it doesn't know a shittin thing. :retard:
 
The mac will respond to anyone if you know how to ask. You can even ask it to tell you a joke.
 
Squiggy said:
You can even ask it to tell you a joke.


G4:How many mac users does it take to change a Lightbulb?

MacUser: I don't know ,How many mac users, does it take to change a Lightbulb?

G4: Mac Users refuse to change and will always remain in the dark.









:jester: J/K

No,really it was just a joke ,honest:hippy:
 
Squiggy said:
I can tell my computer to do things. Macs have had that capability for a long time.
My old P133 had that capability. It wasn't very advanced and needed about a half an hour of training (it could compensate for background noise, assuming that background noise stayed the same throughout the time you were using the computer), also a fairly good mic. But was still pretty neat to try out. My dad trained it and it understood me pretty well.

There's voice activated dialing on my phone, too. I have to say the name once or twice to get it to recognize what I said sometimes.
 
I tried training my old computer to recognise my voice - it still wrote tons of garbled crap. :(

My daughter has one of those talking word processors - Clicker 4 - it sounds a bit like Stephen Hawkins. I could get a new female british voice for it if I wanted to, but it doesn't make that many mistakes.
 
Back
Top