RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

February 6, 2026 Zhiyuan Zeng — University of Washington

Abstract

In this talk, I will discuss our recent effort on scaling supervision at the language model (LM) intelligence boundary, which is continuously evolving during LM development. We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach that uses verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards to scale up RL for LMs. RLVE enables each verifiable environment to dynamically adapt its problem difficulty distribution to the policy model’s capabilities as training progresses. Adaptive environments enable continuous discovery of supervision at the model intelligence boundary throughout RL training.

Speaker Bio

Zhiyuan Zeng is a second-year Ph.D. student in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Hannaneh Hajishirzi and Pang Wei Koh. Previously, he received his bachelor’s degree from the Department of Computer Science and Technology at Tsinghua University in China, where he worked with Danqi Chen at Princeton University and Zhiyuan Liu at Tsinghua University. Zhiyuan is a recipient of the 2025 Amazon AI Ph.D. Fellowship, the 2022 SenseTime Scholarship, the 2022 China National Scholarship for undergraduate students, and a Gold Medal in the 2019 China National Olympiad in Informatics (NOI).

Speaker Homepage »