Conference paper

Multilayered extensions to the speech synthesis markup language for describing expressiveness


In this paper we discuss possible extensions to the Speech SynthesisMarkup Language (SSML) to facilitate the generation of synthetic expressive speech. The proposed extensions are hierarchical in nature, allowing specification in terms of physical parameters such as instantaneous pitch, higher-level parameters such as ToBI labels, or abstract concepts such as emotions. Low-level tags tend to change their values frequently, even within a word, while the more abstract tags generally apply to whole words, sentences or paragraphs. We envision interfaces at different levels to serve different types of users; speech experts may want to use low-level interfaces while artists may prefer to interface with the TTS system at more abstract levels.
