R. Donovan
ICASSP 2000
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesizer and an application specific set of pre-recorded phrases. The text to be synthesized is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesizer is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesizer then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.
R. Donovan
ICASSP 2000
Martin Franz, Salim Roukos
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
R. Donovan
Computer Speech and Language
K. Davies, R. Donovan, et al.
INTERSPEECH - Eurospeech 1999