Optimizing GPU Multiplexing for Efficient and Cost-Effective Access to Diverse Large Language Models in GPU ClustersYue ZhuChen Wanget al.2024MASCOTS 2024
HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative ComputingJinghan HuangJiaqi Louet al.2024ISCA 2024
Best-Effort Power Model Serving for Energy Quantification of Cloud InstancesSunyanan ChoochotkaewTatsuhiro Chibaet al.2024MASCOTS 2024
Advancing Cloud Sustainability: A Versatile Framework for Container Power Model TrainingSunyanan ChoochotkaewChen Wanget al.2023MASCOTS 2023
STRonG: System Topology Risk Analysis on GraphsLars SchneidenbachSandhya Koteshwaraet al.2024CCGrid 2024
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroSys 2024
Towards a Methodology and Framework for AI Sustainability MetricsTamar EilamPedro Bello-Maldonadoet al.2023HotCarbon 2023