August 22, 2025 Esha Choukse — Microsoft
This talk explores two fronts of scaling AI: reducing inference latency and boosting throughput on emerging model types and usecases, and addressing the power and cooling demands of hyperscale data centers. I’ll highlight platform-level optimizations that improve efficiency and responsiveness, and show how infrastructure design choices—spanning power delivery to efficient cooling—are becoming inseparable from AI system performance and sustainability.
Esha Choukse is a Principal Researcher in the Azure Research- Systems team. Esha is currently leading the efficient AI research project, working on cross-stack projects to optimize the AI platform (scheduling/routing), hardware, and datacenter infrastructure for emerging GenAI workloads in cloud, working toward the goal of datacenter efficiency and sustainability.