Chonghan Liu

I am a LLM Algorithm Engineer. My research interests include LLM Inference Optimization, Reinforcement Learning for LLMs, and Multimodal Large Language Models.

I received my B.S. in Computer Science from Nanjing University, and dropped out of the M.S. program at UCLA to work full-time in AI. I am a collaborator on LLaMA-Factory and an active contributor to NVIDIA-NeMo/Automodel.

Google Scholar / GitHub / X / Blog / Email

Publications [all Papers →]

	Rethinking LLM Ensembling from the Perspective of Mixture Models Spotlight Jiale Fu, Yuchu Jiang, Chonghan Liu, Joey Tianyi Zhou, Xu Yang ICML 2026 [paper]
	Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries Poster Minghe Shen, Zhuo Zhi, Chonghan Liu, Shuo Xing, Zhengzhong Tu, Che Liu ACL Main 2026 [paper]
	d²Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching Poster Yuchu Jiang, Yue Cai, Xiangzhong Luo, Jiale Fu, Jiarui Wang, Chonghan Liu, Xu Yang ICLR 2026 [paper] [code]
	Flatter Tokens are More Valuable for Speculative Draft Model Training Poster Jiaming Fan, Cao Daming, Xiangzhong Luo, Jiale Fu, Chonghan Liu, Xu Yang ICLR 2026 [paper]
	AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence Poster Yuliang Liu, Junjie Lu, Zhaoling Chen, Chaofeng Qu, Jason Klein Liu, Chonghan Liu, Zefan Cai, Yunhui Xia, Li Zhao, Jiang Bian, Chuheng Zhang, Wei Shen, Zhouhan Lin ICML 2025 [paper]
	VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models Chonghan Liu, Yimin Du, Qi An, Xin He, Cunqi Zhai, Fei Tan, Weijia Lin, Xiaochun Gong, Yongchao Deng, Shousheng Jia, ... arXiv preprint arXiv:2603.19152, 2026 [paper]
	Mirage: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence Chonghan Liu, Haoran Wang, Frank Henry, Peng Miao, Yifan Zhang, Yue Zhao, Peiran Wu arXiv preprint arXiv:2505.10604, 2025 [paper]

Open Source [all PRs →]

NVIDIA-NeMo/Automodel— PyTorch-native LLM/VLM training framework

Speculative decoding (EAGLE / DFlash / DSpark / Domino / JetSpec): grew this from a single Llama EAGLE-1 recipe into Automodel's full draft-model training stack — P-EAGLE parallel drafting, DFlash, DeepSeek V4 DSpark, Domino online training, and JetSpec causal parallel drafting, plus a vLLM/SGLang serving bridge and an end-to-end tutorial

Model & MoE support: added training support for DeepSeek V4 Flash (plus Multi-Token Prediction), Hy3-preview, and Hy-MT2-30B-A3B, plus Rollout Routing Replay (R3) for MoE RL training

Distributed & VLM training: TP+PP and Context Parallelism for Gemma4 VLM, TransformerEngine attention injection into HF models, and fixes for NaN loss, activation checkpointing, multi-image token expansion, and mRoPE under packing

Knowledge distillation: VLM KD recipe with chunked loss and teacher-model offload

Agent / tool-calling SFT: multi-turn dataset adapter, turn-aware history truncation, reasoning-content masking, tool-call accuracy evaluator, and an SFT tutorial

NVIDIA-NeMo/Megatron-Bridge— bridge between HF checkpoints and Megatron-Core training

Model bridges: added a MiniMax M3 language-model bridge and recipes, and fixed router expert_bias mapping across the DeepSeek family and the wrong GELU variant in the Gemma-1 bridge

Training & data correctness: fixed a training_log argument-misbinding bug, stale non-persistent checkpoints surviving most_recent_k=0, an Energon test split silently aliasing validation, a MegatronMIMO padding mask keyed on token value instead of position, and a missing pipeline-parallelism guard in QwenVLInferenceWrapper

verl-project/verl— RLHF training framework

Added Gemma4 FSDP SFT training support and an ExceptionDumpManager to preserve DataProto on trainer crash, plus fixes for mbridge optimizer config propagation, MultiTurnSFTDataset serialization, multimodal chat_template sync, and SFT trainer token-count metrics

hiyouga/LlamaFactory— Unified efficient fine-tuning framework for 100+ LLMs & VLMs

Collaborator; maintain multimodal (VLM) training support, including fixing VLM training hangs under ZeRO-3/FSDP via dummy-image injection, lazy-loading multimodal inputs to cut preprocessing disk usage, Qwen2-VL mRoPE fixes, ViT gradient checkpointing, and VLM utility fixes; that work later moved to EasyR1

Added second ShareGPT conversation format support and a single-GPU full-parameter batch prediction example, plus a TRL PPOv2 implementation reference

chatchat-space/Langchain-Chatchat— RAG & agent app framework over local LLMs

Built the pluggable model-provider callback/validator architecture and its core provider/model manager, including Xinference integration