Efficient Interactive LLM Serving with Proxy Model-based Sequence Length PredictionHaoran QiuWeichao Maoet al.2024ASPLOS 2024
Evaluating Hardware Memory Disaggregation under Delay and ContentionArchit PatkeHaoran Qiuet al.2022IPDPS 2022
Is Function-as-a-Service a Good Fit for Latency-Critical Services?Haoran QiuSaurabh Jhaet al.2021Middleware 2021