Publications

You can find my full publications on my Google Scholar profile.

Below are my first-author publications.

PontTuset

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
Zeyuan Liu*, Zhihe Yang*, Jiawei Xu*, Rui Yang, Jiafei Lyu, Baoxiang Wang, Yunjian Xu, Xiu Li
Accepted by NeurIPS 2025

We introduce ADG, a diffusion-guided dataset recovery pipeline that detects and repairs multi-faceted corruption in high-dimensional offline RL datasets via Ambient DDPMs.

PontTuset

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu.
AI4MATH@ ICML 2025

We identify the issue of over-dominance of low-probability tokens in RL training for LLMs, and propose two effective methods accordingly which evidently enhance the performance of RL-trained LLMs across various models and datasets.

PontTuset

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, Dongsheng Li.
Accepted by CVPR 2025
Oral Presentation acceptance rate: 0.74% (96/13008)

To address the inefficiency of concurrent DPO-based algorithms in mitigating hallucinations in large vision-language models, we propose the OPA-DPO framework, which employs on-policy aligned expert feedback to enhance learning effectiveness.

PontTuset

Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning
Zhihe Yang, Yunjian Xu, Yang Zhang.
Accepted by ICML 2025

To address the OOD issue during testing for safe offline RL, we propose the first framework that decouple the global observations into reward- and cost-related representations through Q-supervised contrastive learning for decision-making.

PontTuset

DMBP: Diffusion Model Based Predictor for Robust Offlien Reinforcement Learning Against State Observation Perturbations
Zhihe Yang, Yunjian Xu.
Accepted by ICLR 2024

For state-based reinforcement learning tasks with state observation perturbations, we propose a new framework that recovers the actual states with offline-trained conditional diffusion models.

PontTuset

3D printing of short fiber reinforced composites via material extrusion: Fiber breakage
Zhihe Yang, Zeshi Yang, Hui Chen, Wentao Yan.
Accepted by Additive Manufacturing (IF 11.0)

We study fiber breakage and orientation change in MEX's extrusion and deposition stages and report vortex at nozzle outlet front that lead worse fiber alignment.