Navigation

Toward Energy-Efficient LLM Inference Serving Systems

Time August 20, 2025 16
Lecturer dr Jovan Stojković
Location Palata nauke, sala Horizont, 4. sprat

Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to 1) consume large amounts of energy and carbon emissions, and 2) provision high power and cooling capacities, resulting in high datacenter Total Cost of Ownership (TCO) . In this talk, I will present two systems that address these challenges: DynamoLLM and TAPAS.

DynamoLLM is the first energy-management framework for LLM inference environments. It automatically and dynamically reconfigures the inference cluster to optimize for energy and cost of LLM serving under the service's performance SLOs. DynamoLLM saves energy and operational carbon emissions, and reduces cost to the customer, while meeting the latency SLOs. TAPAS is a thermal- and power-aware scheduling scheme designed for GPU clusters in the cloud. TAPAS optimizes power and cooling oversubscription while maintaining minimal impact on performance. By using smart workload placement, request routing, and configuration tuning, TAPAS reduces the thermal and power throttling events, boosting system efficiency without affecting the latency and quality of results.

Biography:

Jovan Stojkovic is an incoming Assistant Professor at the Computer Science department at the University of Texas at Austin. Jovan has recently completed his PhD from the University of Illinois at Urbana-Champaign under the guidance of Professor Josep Torrellas. Jovan’s interests are in the architecture and systems for cloud and datacenter computing. His research was awarded with multiple awards such as HPCA Best Paper award, IEEE Micro Top Pick Honorable Mention,  W. J. Poppelbaum Memorial Award, Kenichi Miura Award, and an invitation to speak at the Heidelberg Laurate Forum. Jovan completed his undergraduate studies at the University of Belgrade, School of Electrical Engineering.

Pictures

  • /uploads/attachment/najava/578/LINKEDIN_Optimizacija_energije_i.jpg