The SyFI Lab at the University of Washington builds efficient and resilient infrastructure for the future of AI. As applications grow more complex, we bridge the gap between next-gen models and heterogeneous hardware through cross-stack innovation, delivering scalable, open-source systems validated by industrial partners.
Our research targets three key areas:
Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong, Yong Wu, Zihao Ye, Charlie Ruan, Yingyi Huang, Yineng Zhang, Liangsheng Yin, Aksara Bayyapu, Luis Ceze, Tianqi Chen — Annual Conference on Machine Learning and Systems (MLSys) (2026)
Yilong Zhao, Jiaming Tang, Kan Zhu, Zihao Ye, Chi-Chih Chang, Chaofan Lin, Jongseok Park, Guangxuan Xiao, Mohamed S. Abdelfattah, Mingyu Gao, Baris Kasikci, Song Han, Ion Stoica — Annual Conference on Machine Learning and Systems (MLSys) (2026)
Yi Pan, Yile Gu, Jinbin Luo, Yibo Wu, Ziren Wang, Hongtao Zhang, Ziyi Xu, Shengkai Lin, Baris Kasikci, Stephanie Wang — Annual Conference on Machine Learning and Systems (MLSys) (2026)
January 31, 2026
January 2026 was a milestone month for the SyFI Lab, with six papers published across MLSys and ICLR—spanning inference, training, scheduling, retrieval, and model architecture.October 03, 2025
We present LLMc, an open-source tool to compress natural language using LLMs as the world's most reference-packed dictionary.September 29, 2025
We present VoxServe, a high-throughput, low-latency serving system designed specifically for Speech Language Models.Dec 12, 2025 In Gim — Yale University
As LLMs become the core of modern AI applications, inference efficiency has become critical, not just for speed but also for sustainability. An old lesson of systems design is that efficiency arises from understanding the workload. Yet today’s LLM serving systems are largely application agnostic. They are optimized for generic text completion, while real applications now perform far richer tasks such as invoking tools, retrieving data, executing code, and coordinating with other agents. It raises a question: How should we rethink LLM serving, not from the system’s perspective, but from the application’s? In this talk, I will explore that question and show how an application-centered approach leads to serving systems that are more programmable, flexible, and application aware.
In Gim is a fourth-year Ph.D. student at Yale University, advised by Prof. Lin Zhong. His research focuses on systems for machine learning, specifically on programmable systems for AI. His first-author works have been recognized by venues like SOSP, MLSys, MobiSys, HotOS, EMNLP, and AAAI.