To mix or not to mix synthetic speech and human speech? Contrasting impact on judge-rated task performance versus self-rated performance and attitudinal responses

Li Gong; Jennifer Lai

doi:10.1023/A:1022382413579

International Journal of Speech Technology

Paper

01 Apr 2003

To mix or not to mix synthetic speech and human speech? Contrasting impact on judge-rated task performance versus self-rated performance and attitudinal responses

View publication

Abstract

Since it is impractical to prerecord human speech for dynamic content such as email messages and news, many commercial speech applications use recorded human speech for fixed content (e.g. system prompts) and synthetic speech for dynamic content. However, mixing human speech and synthetic speech may not be optimal from a consistency perspective. A two-condition between-participants experiment (N - 24) was conducted to compare two versions of a telephony application for Personal Information Management (PIM). In the first condition, all the system output was delivered with synthetic speech. In the second condition, users heard a mix of human speech and synthetic speech. Users managed several email and calendar tasks. Users' task performance was rated by two independent judges. Their self-ratings of task performance and attitudinal responses were also measured by means of questionnaires. Users interacting with the interface that used only synthetic speech performed the task significantly better, while users interacting with the mixed-speech interface thought they did better and had more positive attitudinal responses. A consistency framework drawn from human psychological processing is offered to explain the difference in task performance. Cognitive processing and attitudinal response are differentiated. Design implications and directions for future research are suggested.

Conference paper