Towards a unified view of preference learning for large language models: A survey

B Gao, F Song, Y Miao, Z Cai, Z Yang, L Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …

Treebon: Enhancing inference-time alignment with speculative tree-search and best-of-n sampling

J Qiu, Y Lu, Y Zeng, J Guo, J Geng, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Inference-time alignment enhances the performance of large language models without
requiring additional training or fine-tuning but presents challenges due to balancing …

Cascade reward sampling for efficient decoding-time alignment

B Li, Y Wang, A Grama, R Zhang - arxiv preprint arxiv:2406.16306, 2024 - arxiv.org
Aligning large language models (LLMs) with human preferences is critical for their
deployment. Recently, decoding-time alignment has emerged as an effective plug-and-play …

Inference-time language model alignment via integrated value guidance

Z Liu, Z Zhou, Y Wang, C Yang, Y Qiao - arxiv preprint arxiv:2409.17819, 2024 - arxiv.org
Large language models are typically fine-tuned to align with human preferences, but tuning
large models is computationally intensive and complex. In this work, we introduce $\textit …

Towards building specialized generalist ai with system 1 and system 2 fusion

K Zhang, B Qi, B Zhou - arxiv preprint arxiv:2407.08642, 2024 - arxiv.org
In this perspective paper, we introduce the concept of Specialized Generalist Artificial
Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence …

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

A Rawat, S Schoepf, G Zizzo, G Cornacchia… - arxiv preprint arxiv …, 2024 - arxiv.org
As generative AI, particularly large language models (LLMs), become increasingly
integrated into production applications, new attack surfaces and vulnerabilities emerge and …

Decoding-time Realignment of Language Models

T Liu, S Guo, L Bianco, D Calandriello… - arxiv preprint arxiv …, 2024 - arxiv.org
Aligning language models with human preferences is crucial for reducing errors and biases
in these models. Alignment techniques, such as reinforcement learning from human …

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

S Mukherjee, A Lalitha, S Sengupta… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs)
is a challenging problem as human preferences are complex, multifaceted, and often …

Towards Inference-time Category-wise Safety Steering for Large Language Models

A Bhattacharjee, S Ghosh, T Rebedea… - arxiv preprint arxiv …, 2024 - arxiv.org
While large language models (LLMs) have seen unprecedented advancements in
capabilities and applications across a variety of use-cases, safety alignment of these models …

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

G Puthumanaillam, M Vora, P Thangeda… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper examines the challenges associated with achieving life-long superalignment in
AI systems, particularly large language models (LLMs). Superalignment is a theoretical …