School of Computer Science › Language Technologies Institute › Research › research-articles › TRANSFORM: Flexible Voice Synthesis Through Articulatory Voice Transformation

TRANSFORM: Flexible Voice Synthesis Through Articulatory Voice Transformation

Speech Processing

We have always wanted our machines to talk to us, but most people have strong preferences for particular voices. Current techniques in speech synthesis can build voices that sound very close to the original speaker, capturing the style, manner and articulation of the source voice. However, such systems require many hours of carefully recorded speech and expert tuning to reach an acceptable level of quality. An exciting new alternative method for building synthetic voices is voice transformation. Here we use an existing recorded database and convert it to a target voice using as little as 10 to 20 sentences. These techniques offer the potential to make speech synthesizers talk in whatever voice we desire, with significantly less effort required than previous techniques.