- Academic Search

A Albalak, Y Elazar, SM **e, S Longpre… - arxiv preprint arxiv …, 2024 - arxiv.org

A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Gem Citer Citeret af 83 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of confidence estimation and calibration in large language models

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

Gem Citer Citeret af 41 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] zhjwpku.com

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Gem Citer Citeret af 3738 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aligning large language models with human: A survey

Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) trained on extensive textual corpora have emerged as
leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …

Gem Citer Citeret af 300 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Slic-hf: Sequence likelihood calibration with human feedback

Y Zhao, R Joshi, T Liu, M Khalman, M Saleh… - arxiv preprint arxiv …, 2023 - arxiv.org

Learning from human feedback has been shown to be effective at aligning language models
with human preferences. Past work has often relied on Reinforcement Learning from Human …

Gem Citer Citeret af 239 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rrhf: Rank responses to align language models with human feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large
language models with human preferences, significantly enhancing the quality of interactions …

Gem Citer Citeret af 252 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language model alignment: A survey

T Shen, R **, Y Huang, C Liu, W Dong, Z Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

Gem Citer Citeret af 161 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rrhf: Rank responses to align language models with human feedback

H Yuan, Z Yuan, C Tan, W Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of
large language models with human preferences, significantly enhancing the quality of …

Gem Citer Citeret af 82 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Statistical rejection sampling improves preference optimization

T Liu, Y Zhao, R Joshi, M Khalman, M Saleh… - arxiv preprint arxiv …, 2023 - arxiv.org

Improving the alignment of language models with human preferences remains an active
research challenge. Previous approaches have primarily utilized Reinforcement Learning …

Gem Citer Citeret af 163 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

Gem Citer Citeret af 78 Relaterede artikler Alle 5 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Calibrating sequence likelihood improves conditional language generation

A survey on data selection for language models

A survey of confidence estimation and calibration in large language models

[PDF][PDF] A survey of large language models

Aligning large language models with human: A survey

Slic-hf: Sequence likelihood calibration with human feedback

Rrhf: Rank responses to align language models with human feedback without tears

Large language model alignment: A survey

Rrhf: Rank responses to align language models with human feedback

Statistical rejection sampling improves preference optimization

Omnivec: Learning robust representations with cross modal sharing