Global RNN Transducer Models For Multi-dialect Speech RecognitionTakashi FukudaSamuel Thomaset al.2022INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video RetrievalNina ShvetsovaBrian Chenet al.2022CVPR 2022
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding SystemsSamuel ThomasJeff Kuoet al.2022ICASSP 2022
Towards End-to-end Integration of Dialog History For Improved Spoken Language UnderstandingVishal SunderSamuel Thomaset al.2022ICASSP 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR ModelsSamuel ThomasBrian Kingsburyet al.2022ICASSP 2022
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasetsZvi KonsAharon Sattet al.2022ICASSP 2022
Improving End-to-End Models for Set Prediction in Spoken Language UnderstandingJeff KuoZoltan Tuskeet al.2022ICASSP 2022
Speak or chat with me: End-to-end spoken language understanding system with flexible inputsSujeong ChaWangrui Houet al.2021INTERSPEECH 2021
Integrating dialog history into end-to-end spoken language understanding systemsJatin GanhotraSamuel Thomaset al.2021INTERSPEECH 2021
Knowledge distillation based training of universal ASR source models for cross-lingual transferTakashi FukudaSamuel Thomas2021INTERSPEECH 2021