Publications
Below are my first-author publications.
![]() |
We identify the issue of over-dominance of low-probability tokens in RL training for LLMs, and propose two effective methods accordingly which evidently enhance the performance of RL-trained LLMs across various models and datasets. | ![]() |
To address the inefficiency of concurrent DPO-based algorithms in mitigating hallucinations in large vision-language models, we propose the OPA-DPO framework, which employs on-policy aligned expert feedback to enhance learning effectiveness. | ![]() |
To address the OOD issue during testing for safe offline RL, we propose the first framework that decouple the global observations into reward- and cost-related representations through Q-supervised contrastive learning for decision-making. | ![]() |
For state-based reinforcement learning tasks with state observation perturbations, we propose a new framework that recovers the actual states with offline-trained conditional diffusion models. | ![]() |
We study fiber breakage and orientation change in MEX's extrusion and deposition stages and report vortex at nozzle outlet front that lead worse fiber alignment. |