- Academic Search

How to leverage demonstration data in alignment for large language model? a self-imitation learning perspective

T **ao, M Li, Y Yuan, H Zhu, C Cui… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces a novel generalized self-imitation learning ($\textbf {GSIL} $)
framework, which effectively and efficiently aligns large language models with offline …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fact-Level Confidence Calibration and Self-Correction

Y Yuan, B Xu, H Tan, F Sun, T **ao, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Confidence calibration in LLMs, ie, aligning their self-assessed confidence with the actual
accuracy of their responses, enabling them to self-evaluate the correctness of their outputs …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

T **ao, Y Yuan, Z Chen, M Li, S Liang, Z Ren… - arxiv preprint arxiv …, 2025 - arxiv.org

Existing preference optimization objectives for language model alignment require additional
hyperparameters that must be extensively tuned to achieve optimal performance, increasing …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules

T **ao, C Cui, H Zhu… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Pretraining molecular representations is crucial for drug and material discovery. Recent
methods focus on learning representations from geometric structures, effectively capturing …

Speichern Zitieren Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MITA: Bridging the Gap between Model and Data for Test-time Adaptation

Y Yuan, B Xu, T **ao, L Hou, F Sun, H Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the
generalizability of models. However, existing mainstream TTA methods, predominantly …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Cal-dpo: Calibrated direct preference optimization for language model alignment

How to leverage demonstration data in alignment for large language model? a self-imitation learning perspective

Fact-Level Confidence Calibration and Self-Correction

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules

MITA: Bridging the Gap between Model and Data for Test-time Adaptation