Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisHongkang LiSongtao Luet al.2025ICLR 2025
M2 ASR: Multilingual Multi-task Automatic Speech Recognition via Multi-objective OptimizationA SaifLisha Chenet al.2024INTERSPEECH 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?Hongkang LiMeng Wenget al.2024ICML 2024
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?Hongkang LiMeng Wenget al.2024ICML 2024
How Can Personalized Context Help? Exploring Joint Retrieval of Passage and Personalized ContextHui WanHongkang Liet al.2024ICASSP 2024
HOW CAN PERSONALIZED CONTEXT HELP? EXPLORING JOINT RETRIEVAL OF PASSAGE AND PERSONALIZED CONTEXTHui WanHongkang Liet al.2024ICASSP 2024
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel OptimizationA F M SaifXiaodong Cuiet al.2024ICASSP 2024
Bilevel Joint Unsupervised and Supervised Training for Automatic Speech RecognitionXiaodong CuiA.F.M. Saifet al.2024IEEE/ACM TASLP
Improving RNN Transducer Acoustic Models for English Conversational Speech RecognitionXiaodong CuiGeorge Saonet al.2023INTERSPEECH 2023
Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous DataYonggui YanJie Chenet al.2023ICML 2023