How to leverage demonstration data in alignment for large language model? a self-imitation learning perspective

T **ao, M Li, Y Yuan, H Zhu, C Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces a novel generalized self-imitation learning ($\textbf {GSIL} $)
framework, which effectively and efficiently aligns large language models with offline …

Fact-Level Confidence Calibration and Self-Correction

Y Yuan, B Xu, H Tan, F Sun, T **ao, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Confidence calibration in LLMs, ie, aligning their self-assessed confidence with the actual
accuracy of their responses, enabling them to self-evaluate the correctness of their outputs …

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

T **ao, Y Yuan, Z Chen, M Li, S Liang, Z Ren… - arxiv preprint arxiv …, 2025 - arxiv.org
Existing preference optimization objectives for language model alignment require additional
hyperparameters that must be extensively tuned to achieve optimal performance, increasing …

GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules

T **ao, C Cui, H Zhu… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Pretraining molecular representations is crucial for drug and material discovery. Recent
methods focus on learning representations from geometric structures, effectively capturing …

MITA: Bridging the Gap between Model and Data for Test-time Adaptation

Y Yuan, B Xu, T **ao, L Hou, F Sun, H Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the
generalizability of models. However, existing mainstream TTA methods, predominantly …