Phrase splicing and variable substitution using the IBM trainable speech synthesis system

R. Donovan; Martin Franz; J. Sorensen; Salim Roukos

doi:10.1109/icassp.1999.758140

ICASSP 1999

Conference paper

15 Mar 1999

Phrase splicing and variable substitution using the IBM trainable speech synthesis system

View publication

Abstract

This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesizer and an application specific set of pre-recorded phrases. The text to be synthesized is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesizer is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesizer then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.

Conference paper