Towards a unified view of preference learning for large language models: A survey
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …
factors to achieve success is aligning the LLM's output with human preferences. This …
Treebon: Enhancing inference-time alignment with speculative tree-search and best-of-n sampling
Inference-time alignment enhances the performance of large language models without
requiring additional training or fine-tuning but presents challenges due to balancing …
requiring additional training or fine-tuning but presents challenges due to balancing …
Cascade reward sampling for efficient decoding-time alignment
Aligning large language models (LLMs) with human preferences is critical for their
deployment. Recently, decoding-time alignment has emerged as an effective plug-and-play …
deployment. Recently, decoding-time alignment has emerged as an effective plug-and-play …
Inference-time language model alignment via integrated value guidance
Large language models are typically fine-tuned to align with human preferences, but tuning
large models is computationally intensive and complex. In this work, we introduce $\textit …
large models is computationally intensive and complex. In this work, we introduce $\textit …
Towards building specialized generalist ai with system 1 and system 2 fusion
In this perspective paper, we introduce the concept of Specialized Generalist Artificial
Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence …
Intelligence (SGAI or simply SGI) as a crucial milestone toward Artificial General Intelligence …
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
As generative AI, particularly large language models (LLMs), become increasingly
integrated into production applications, new attack surfaces and vulnerabilities emerge and …
integrated into production applications, new attack surfaces and vulnerabilities emerge and …
Decoding-time Realignment of Language Models
Aligning language models with human preferences is crucial for reducing errors and biases
in these models. Alignment techniques, such as reinforcement learning from human …
in these models. Alignment techniques, such as reinforcement learning from human …
Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs)
is a challenging problem as human preferences are complex, multifaceted, and often …
is a challenging problem as human preferences are complex, multifaceted, and often …
Towards Inference-time Category-wise Safety Steering for Large Language Models
While large language models (LLMs) have seen unprecedented advancements in
capabilities and applications across a variety of use-cases, safety alignment of these models …
capabilities and applications across a variety of use-cases, safety alignment of these models …
A Moral Imperative: The Need for Continual Superalignment of Large Language Models
This paper examines the challenges associated with achieving life-long superalignment in
AI systems, particularly large language models (LLMs). Superalignment is a theoretical …
AI systems, particularly large language models (LLMs). Superalignment is a theoretical …