SyFI Lab Systems for Future Intelligence

Publications

The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

Frank Sifei Luan, Ron Yifeng Wang, Yile Gu, Ziming Mao, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang — (2025)

PDF
Programmable and Adaptive Scheduling for Distributed Systems

Yuyao Wang, Xiangfeng Zhu, Ratul Mahajan, Stephanie Wang — Hot Topics in Networks (HotNets) (2025)

PDF
Piper: Towards Flexible Pipeline Parallelism for PyTorch

Megan Frisella, Arvin Oentoro, Xiangyu Gao, Gilbert Bernstein, Stephanie Wang — Practical Adoption Challenges of ML for Systems (PACMI) (2025)

PDF
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci — Conference on Empirical Methods in Natural Language Processing (EMNLP) (2025)

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Best Paper Award

Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze — Annual Conference on Machine Learning and Systems (MLSys) (2025)

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Chien-Yu Lin, Keisuke Kamahori, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci — arXiv preprint (2025)

PDF
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs

Kan Zhu, Tian Tang, Qinyu Xu, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci — arXiv preprint (2025)

PDF
Argos: Detecting Dynamic Anomalies in the Cloud with Rule Generation

Yile Gu, Hoang Doan Nguyen, Demirhan Celik, Sifat Hasan, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng — arXiv preprint (2025) (2025)

PDF
NanoFlow: Towards Optimal Large Language Model Serving Throughput

Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Tian Tang, Qinyu Xu, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci — Symposium on Operating Systems Design and Implementation (OSDI) (2025)

Towards ML System Extensibility

Weixin Deng, Andy Ruan, Megan Frisella, Kai-Hsun Chen, SangBin Cho, Jack Tigar Humphries, Rui Qiao, Stephanie Wang — Hot Topics in Operating Systems (HotOS) (2025)

PDF
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu, Kan Zhu, Baris Kasikci — International Conference on Learning Representations (ICLR) (2025)

Datacomp-LM: In search of the next generation of training sets for language models

Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar — Conference on Neural Information Processing Systems (NeurIPS) (2024)

PDF
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han — International Conference on Machine Learning (ICML) (2024)

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci — Annual Conference on Machine Learning and Systems (MLSys) (2024)