Asaf Rendel, Raul Fernandez, et al.
ICASSP 2016
In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the speakers to generate further training examples. We demonstrate the validity of the approach by improving performance when the amount of base labeled examples is small (showing reductions in the range of 7%-12% for reduced-data conditions) as well as in terms of its generalization to speakers unseen in the training set (showing a relative reduction in the error rate of 8.74% and 4.75%, on the average, for boundaries and accent tasks respectively, in leave-one-speaker-out validation).
Asaf Rendel, Raul Fernandez, et al.
ICASSP 2016
Victor Soto, Lidia Mangu, et al.
INTERSPEECH 2014
Andrew Rosenberg, Raul Fernandez, et al.
ICASSP 2018
Raul Fernandez, Asaf Rendel, et al.
ICASSP 2013