[PDF][PDF] A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - researchgate.net
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Arm: Alignment with residual energy-based model

B Pang, C **ong, Y Zhou - Proceedings of the 2024 Conference of …, 2024 - aclanthology.org
While large language models (LLMs) trained with large-scale unsupervised learning acquire
a wide variety of world knowledge and skills, its behavior does not necessarily align with …

Probing the multi-turn planning capabilities of LLMs via 20 question games

Y Zhang, J Lu, N Jaitly - arxiv preprint arxiv:2310.01468, 2023 - arxiv.org
Large language models (LLMs) are effective at answering questions that are clearly asked.
However, when faced with ambiguous queries they can act unpredictably and produce …

A Grounded Preference Model for LLM Alignment

T Naseem, G Xu, S Swaminathan… - Findings of the …, 2024 - aclanthology.org
Despite LLMs' recent advancements, they still suffer from factual inconsistency and
hallucination. An often-opted remedy is retrieval-augmented generation–however, there is …

A study on improving reasoning in language models

Y Du, A Havrilla, S Sukhbaatar, P Abbeel… - I Can't Believe It's Not …, 2024 - openreview.net
Accurately carrying out complex reasoning is a crucial component of deployable and
reliable language models. While current language models can exhibit this capability with …