Blog

Ekka: Automated Diagnosis of Silent Errors in LLM Inference

June 29, 2026

We present Ekka, an automated system that diagnoses silent errors in LLM serving frameworks via differential debugging: aligning and comparing a buggy framework's intermediate states against a trusted reference to pinpoint the root cause. Ekka reaches 80% pass@1 accuracy on 17 real-world vLLM and SGLang bugs and has already found 4 new ones confirmed by developers.

TraceLab: Characterizing Coding Agent Workloads for LLM Serving

New Research Inference & Serving Agents

June 25, 2026

As coding agents become a major LLM application, serving them efficiently is an open systems problem. We release the SyFI coding trace — ~4,300 real sessions and 55B tokens for performance modeling from our daily Claude Code and Codex use — and TraceLab, an open pipeline to collect, sanitize, analyze, and replay your own coding agent traces.

M*: A Modular, Extensible, Serving System for Multimodal Models

New Research Inference & Serving Multimodal Programmable Systems

June 19, 2026

Today's models no longer fit the mold of autoregressive token generation. M* treats composite multimodal models as dataflow graphs and requests as Walks on those graphs, serving image, audio, video, and robot-action models at or above the performance of specialized engines.

Introducing Piper: A Programmable Distributed Training System

New Research Training Programmable Systems

June 05, 2026

We present Piper, a programmable distributed training system that uses model annotations and scheduling directives to express model placement, pipeline scheduling, and GPU stream scheduling.

SyFI Team Wins CUDA Kernel Agent Contest at MLSys 2026

Lab Update Agents

May 28, 2026

Team UW SyFI won three awards across two tracks at the FlashInfer AI Kernel Generation Contest at MLSys 2026 — every line of kernel code written by coding agents, not humans.

Let AI Agents Write Your Serving Stack with VibeServe

New Research Inference & Serving Agents

May 12, 2026

We present VibeServe, a multi-agent system that synthesizes a complete LLM serving runtime end-to-end, specialized to a user-specified model, hardware, and workload.

SyFI in January 2026: A Big Month for Systems-Driven AI Research

Lab Update Inference & Serving Training ML + Data

January 31, 2026

January 2026 was a milestone month for the SyFI Lab, with six papers published across MLSys and ICLR—spanning inference, training, scheduling, retrieval, and model architecture.

Meet LLMc: Beating All Compression with LLMs

New Research Inference & Serving

October 03, 2025

We present LLMc, an open-source tool to compress natural language using LLMs as the world's most reference-packed dictionary.

Efficient Serving of SpeechLMs with VoxServe

New Research Inference & Serving Multimodal

September 29, 2025

We present VoxServe, a high-throughput, low-latency serving system designed specifically for Speech Language Models.