Global RNN Transducer Models For Multi-dialect Speech RecognitionTakashi FukudaSamuel Thomaset al.2022INTERSPEECH 2022
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label SmoothingXiaodong CuiGeorge Saonet al.2022INTERSPEECH 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent SystemsVishal SunderEric Fosler-Lussieret al.2022INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video RetrievalNina ShvetsovaBrian Chenet al.2022CVPR 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR ModelsSamuel ThomasBrian Kingsburyet al.2022ICASSP 2022
Decentralized Bilevel Optimization for Personalized Client LearningSongtao LuXiaodong Cuiet al.2022ICASSP 2022
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasetsZvi KonsAharon Sattet al.2022ICASSP 2022
Integrating dialog history into end-to-end spoken language understanding systemsJatin GanhotraSamuel Thomaset al.2021INTERSPEECH 2021
On the limit of English conversational speech recognitionZoltan TuskeGeorge Saonet al.2021INTERSPEECH 2021
Reducing exposure bias in training recurrent neural network transducersXiaodong CuiBrian Kingsburyet al.2021INTERSPEECH 2021