A survey on data selection for language models

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

A survey of confidence estimation and calibration in large language models

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Aligning large language models with human: A survey

Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) trained on extensive textual corpora have emerged as
leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …

Slic-hf: Sequence likelihood calibration with human feedback

Y Zhao, R Joshi, T Liu, M Khalman, M Saleh… - arxiv preprint arxiv …, 2023 - arxiv.org
Learning from human feedback has been shown to be effective at aligning language models
with human preferences. Past work has often relied on Reinforcement Learning from Human …

Rrhf: Rank responses to align language models with human feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large
language models with human preferences, significantly enhancing the quality of interactions …

Large language model alignment: A survey

T Shen, R **, Y Huang, C Liu, W Dong, Z Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

Rrhf: Rank responses to align language models with human feedback

H Yuan, Z Yuan, C Tan, W Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of
large language models with human preferences, significantly enhancing the quality of …

Statistical rejection sampling improves preference optimization

T Liu, Y Zhao, R Joshi, M Khalman, M Saleh… - arxiv preprint arxiv …, 2023 - arxiv.org
Improving the alignment of language models with human preferences remains an active
research challenge. Previous approaches have primarily utilized Reinforcement Learning …

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …