Two fronts of AI: addressing performance and power/cooling challenges

August 22, 2025 Esha Choukse — Microsoft

Abstract

This talk explores two fronts of scaling AI: reducing inference latency and boosting throughput on emerging model types and usecases, and addressing the power and cooling demands of hyperscale data centers. I’ll highlight platform-level optimizations that improve efficiency and responsiveness, and show how infrastructure design choices—spanning power delivery to efficient cooling—are becoming inseparable from AI system performance and sustainability.

Speaker Bio

Esha Choukse is a Principal Researcher in the Azure Research- Systems team. Esha is currently leading the efficient AI research project, working on cross-stack projects to optimize the AI platform (scheduling/routing), hardware, and datacenter infrastructure for emerging GenAI workloads in cloud, working toward the goal of datacenter efficiency and sustainability.