Effectively Scaling Reinforcement Learning for LLMs

December 1, 2025 Yi Wu — Tsinghua University

Abstract

RL has been an engine for recent LLM advances, from RLHF for ChatGPT, to reasoning RL for thinking models, and more recently, agentic RL for agent products. In this talk, we will discuss and address the main scaling challenges of RL training for LLMs, starting from RLHF to reasoning RL and agentic RL. We will cover three works: (1) the ReaL system for efficient RLHF (https://github.com/openpsi-project/ReaLHF, https://arxiv.org/abs/2406.14088, MLSys 2025), (2) the AReaL system for fully asynchronous reasoning RL (https://github.com/inclusionAI/AReaL, https://arxiv.org/abs/2505.24298, NeurIPS 2025), and (3) ASearcher, an end-to-end RL search agent trained by AReaL (https://github.com/inclusionAI/ASearcher, https://arxiv.org/abs/2508.07976)

Speaker Bio

Yi Wu is an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. He obtained his Ph.D. from UC Berkeley and was a researcher at OpenAI from 2019 to 2020. His research focuses on reinforcement learning, multi-agent learning, and LLM agent. His representative works include the value iteration network, the MADDPG/MAPPO algorithm, OpenAI’s hide-and-seek project, and the AReaL project. He received the best paper award at NIPS 2016, the best demo award finalist at ICRA 2024, and MIT TR35 Asia Pacific 2025.