Publications

You can find my full publications on my Google Scholar profile.

Below are my first-author publications.

PontTuset

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu.
Under review as a conference paper

We identify the issue of over-dominance of low-probability tokens in RL training for LLMs, and propose two effective methods accordingly which evidently enhance the performance of RL-trained LLMs across various models and datasets.

PontTuset

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, Dongsheng Li.
Accepted by CVPR 2025
Oral Presentation acceptance rate: 0.74% (96/13008)

To address the inefficiency of concurrent DPO-based algorithms in mitigating hallucinations in large vision-language models, we propose the OPA-DPO framework, which employs on-policy aligned expert feedback to enhance learning effectiveness.

PontTuset

Q-Supervised Contrastive Representation: A State Decoupling Framework for Safe Offline Reinforcement Learning
Zhihe Yang, Yunjian Xu, Yang Zhang.
Accepted by ICML 2025

To address the OOD issue during testing for safe offline RL, we propose the first framework that decouple the global observations into reward- and cost-related representations through Q-supervised contrastive learning for decision-making.

PontTuset

DMBP: Diffusion Model Based Predictor for Robust Offlien Reinforcement Learning Against State Observation Perturbations
Zhihe Yang, Yunjian Xu.
Accepted by ICLR 2024

For state-based reinforcement learning tasks with state observation perturbations, we propose a new framework that recovers the actual states with offline-trained conditional diffusion models.

PontTuset

3D printing of short fiber reinforced composites via material extrusion: Fiber breakage
Zhihe Yang, Zeshi Yang, Hui Chen, Wentao Yan.
Accepted by Additive Manufacturing (IF 11.0)

We study fiber breakage and orientation change in MEX's extrusion and deposition stages and report vortex at nozzle outlet front that lead worse fiber alignment.