David W. Jacobs, Daphna Weinshall, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence
This paper proposes a new text-to-speech synthesis technique, for producing continuous, natural sounding speech of a specific speaker. The synthesis technique is based on selecting short speech frames from a phoneme-labeled speech database. The selection procedure involves minimization of a distortion criterion, by a dynamic programming algorithm. The proposed scheme is more flexible than many existing schemes using fixed speech segments, such as diphones. It results in a more natural synthesized speech. An efficient speech representation is used to express simply and accurately the spectral continuity of speech. A further improvement in the database search mechanism and in database size was obtained by sectioning the speech phonemes into "steady-states"and "transitions". The resulting synthesized speech quality, is satisfactory and indeed preserves the natural voice of the speaker.
David W. Jacobs, Daphna Weinshall, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence
Minerva M. Yeung, Fred Mintzer
ICIP 1997
Graham Mann, Indulis Bernsteins
DIMEA 2007
Fearghal O'Donncha, Albert Akhriev, et al.
Big Data 2021