Rangachari Anand, Kishan Mehrotra, et al.
IEEE Transactions on Neural Networks
Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can lead to consistent improvement compared to transferring only a single representation.
Rangachari Anand, Kishan Mehrotra, et al.
IEEE Transactions on Neural Networks
Michael Heck, Masayuki Suzuki, et al.
INTERSPEECH 2017
Heshan Fernando, Lisha Chen, et al.
ICASSP 2024
Dzung Phan, Vinicius Lima
INFORMS 2023