Publications

You can find my full publications on my Google Scholar profile.

Below are my first-author publications.

	ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning Zeyuan Liu, Zhihe Yang, Jiawei Xu, Rui Yang, Jiafei Lyu, Baoxiang Wang, Yunjian Xu, Xiu Li Accepted by NeurIPS 2025* We introduce ADG, a diffusion-guided dataset recovery pipeline that detects and repairs multi-faceted corruption in high-dimensional offline RL datasets via Ambient DDPMs.
	Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu. AI4MATH@ ICML 2025 Code We identify the issue of over-dominance of low-probability tokens in RL training for LLMs, and propose two effective methods accordingly which evidently enhance the performance of RL-trained LLMs across various models and datasets.
	Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, Dongsheng Li. Accepted by CVPR 2025 Oral Presentation acceptance rate: 0.74% (96/13008) Code To address the inefficiency of concurrent DPO-based algorithms in mitigating hallucinations in large vision-language models, we propose the OPA-DPO framework, which employs on-policy aligned expert feedback to enhance learning effectiveness.
	Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning Zhihe Yang, Yunjian Xu, Yang Zhang. Accepted by ICML 2025 Code To address the OOD issue during testing for safe offline RL, we propose the first framework that decouple the global observations into reward- and cost-related representations through Q-supervised contrastive learning for decision-making.
	DMBP: Diffusion Model Based Predictor for Robust Offlien Reinforcement Learning Against State Observation Perturbations Zhihe Yang, Yunjian Xu. Accepted by ICLR 2024 Code For state-based reinforcement learning tasks with state observation perturbations, we propose a new framework that recovers the actual states with offline-trained conditional diffusion models.
	3D printing of short fiber reinforced composites via material extrusion: Fiber breakage Zhihe Yang, Zeshi Yang, Hui Chen, Wentao Yan. Accepted by Additive Manufacturing (IF 11.0) We study fiber breakage and orientation change in MEX's extrusion and deposition stages and report vortex at nozzle outlet front that lead worse fiber alignment.