Yuan Ni, Qiong Kai Xu, et al.
WSDM 2016
Word emphasis prediction is an important part of expressive prosody generation in modern Text-To-Speech (TTS) systems. We present a method for predicting emphasized words for expressive TTS, based on a Deep Neural Network (DNN). We show that the presented method outperforms machine learning methods based on hand-crafted features in terms of objective metrics such as precision and recall. Using a listening test, we further demonstrate that the contribution of the predicted emphasized words to the expressiveness of the synthesized speech is subjectively perceivable.
Yuan Ni, Qiong Kai Xu, et al.
WSDM 2016
Haggai Roitman, Yosi Mass
ICTIR 2019
Slava Shechtman
ICASSP 2013
Emmanouil Schinas, Symeon Papadopoulos, et al.
PCI 2013