Tao Gui

Tao Gui桂韬

Associate Professor
NLP Group, Fudan University

I work at the intersection of complex reasoning and AI agents for large language models. My group builds systems that can reason, plan, self-evolve, and act in diverse environments — and we study how to align them with human values through reinforcement learning.

I co-lead the NLP-LI lab at Fudan with Qi Zhang, within the group of Xuanjing Huang. Ph.D. from Fudan (2021), B.S. from NUDT.

LLM Reasoning AI Agents RLHF & Alignment Multimodal LLM
Agents & Environments
Preprint 2026
A systematic benchmark evaluating how well LLMs learn from contextual demonstrations, covering diverse task types and context configurations.
ICLR 2026
An open-source RL framework for training agents on long-horizon tasks via multi-turn interactions, extending the AgentGym ecosystem with scalable reinforcement learning.
Preprint 2025
A comprehensive survey on memory for foundation model agents — proposing a unified taxonomy of short-term, long-term, and parametric memory, and charting the path toward persistent, capable agents.
Preprint 2025
An AI-powered academic search engine combining intelligent retrieval, paper understanding, and multi-document synthesis to help researchers navigate the literature.
Preprint 2025
A unified ecosystem for constructing large-scale training environments, enabling systematic agent capability building through scalable environment generation and curriculum design.
Preprint 2025
Automatically assesses the novelty of scientific submissions via retrieval-augmented verification, producing traceable, evidence-backed judgments for peer review.
ACL 2025
A framework for training generally-capable agents across 14 environments and 89 tasks. Agents autonomously explore and evolve without step-by-step supervision, achieving 82.4% average success rate.
Preprint 2023
One of the earliest and most comprehensive surveys on LLM-based agents — covering perception, reasoning, planning, and action — widely cited as a foundational reference in the agent research community.
Alignment & Reinforcement Learning
Preprint 2025
Reveals that models trained to distinguish reference from target policies can serve as general-purpose reward models — a new paradigm for scalable RLHF without explicit reward annotation.
Preprint 2025
Balances positive and negative token contributions with adaptive clipping bounds to stabilize off-policy RL training for language models, maintaining entropy while improving learning efficiency.
Preprint 2024
A mixture-of-experts framework using multiple LoRA adapters with a learned router to preserve world knowledge during instruction tuning — keeping broad capabilities while gaining task-specific skills.
Preprint 2024
Tackles two fundamental problems in reward modeling: handling noisy human preference data and improving training robustness. Proposes methods to detect and correct label noise in preference annotations.
Preprint 2023
A pioneering empirical study of PPO instability in RLHF. Introduces PPO-max with token-level KL penalty for consistent alignment training. Open-sourced as MOSS-RLHF.
Multimodal & Systems
Preprint 2026
Enables humanoid robots to execute complex whole-body movements from natural language commands, bridging language understanding and motor control.
AAAI 2026
Converts standard multi-head attention into DeepSeek's economical multi-head latent attention for any transformer, dramatically reducing KV-cache and inference cost without retraining.
Preprint 2025
A reproducible framework for generating full-length songs with fine-grained control over style, structure, and instrumentation — supporting long-form coherent music creation.
Preprint 2024
An out-of-the-box multi-language sandbox providing unified feedback from compilers and static analysis tools, enabling LLMs to write, execute, and debug code across languages.
ACL 2021
A comprehensive toolkit for evaluating model robustness through automated transformations, adversarial attacks, and subpopulation analysis across 13 NLP tasks and multiple languages.

Full publication list on Google Scholar →

Contact
Office
Room 242, Environmental Science Bldg
220 Handan Rd, Shanghai
Lab
Room A3011, Intersection Bldg No.2
2005 Songhu Rd, Shanghai
Links
Scholar