Optimizing GPU Multiplexing for Efficient and Cost-Effective Access to Diverse Large Language Models in GPU ClustersYue ZhuChen Wanget al.2024MASCOTS 2024
Kepler: A Framework to Calculate the Energy Consumption of Containerized ApplicationsMarcelo AmaralHuamin Chenet al.2023CLOUD 2023
ImageJockey: A framework for container performance engineeringTakeshi YoshimuraRina Nakazawaet al.2020CLOUD 2020
Taming Performance Degradation of Containers in the Case of Extreme Memory OvercommitmentRina NakazawaKazunori Ogataet al.2017CLOUD 2017
Predicting LLM Inference Latency: A Roofline-Driven ML MethodSaki ImaiRina Nakazawaet al.2024NeurIPS 2024
Best-Effort Power Model Serving for Energy Quantification of Cloud InstancesSunyanan ChoochotkaewTatsuhiro Chibaet al.2024MASCOTS 2024
ConfAdvisor: A performance-centric configuration tuning framework for containers on kubernetesTatsuhiro ChibaRina Nakazawaet al.2019IC2E 2019
Visualization Tool for Designing Microservices with the Monolith-First ApproachRina NakazawaTakanori Uedaet al.2018VISSOFT 2018