Llms-as-judges: a comprehensive survey on llm-based evaluation methods
The rapid advancement of Large Language Models (LLMs) has driven their expanding
application across various fields. One of the most promising applications is their role as …
application across various fields. One of the most promising applications is their role as …
Reef: Representation encoding fingerprints for large language models
Protecting the intellectual property of open-source Large Language Models (LLMs) is very
important, because training LLMs costs extensive computational resources and data …
important, because training LLMs costs extensive computational resources and data …
Synthesizing post-training data for llms through multi-agent simulation
Post-training is essential for enabling large language models (LLMs) to follow human
instructions. Inspired by the recent success of using LLMs to simulate human society, we …
instructions. Inspired by the recent success of using LLMs to simulate human society, we …
Align anything: Training all-modality models to follow instructions with language feedback
Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the
instruction-following capabilities of large language models; however, it remains …
instruction-following capabilities of large language models; however, it remains …
Vlsbench: Unveiling visual leakage in multimodal safety
Safety concerns of Multimodal large language models (MLLMs) have gradually become an
important problem in various applications. Surprisingly, previous works indicate a counter …
important problem in various applications. Surprisingly, previous works indicate a counter …
Position: Llm unlearning benchmarks are weak measures of progress
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Learning from preference feedback is a common practice for aligning large language
models~(LLMs) with human value. Conventionally, preference data is learned and encoded …
models~(LLMs) with human value. Conventionally, preference data is learned and encoded …
Course-correction: Safety alignment using synthetic preferences
The risk of harmful content generated by large language models (LLMs) becomes a critical
concern. This paper presents a systematic study on assessing and improving LLMs' …
concern. This paper presents a systematic study on assessing and improving LLMs' …
[PDF][PDF] Aligner: Efficient alignment by learning to correct
With the rapid development of large language models (LLMs) and ever-evolving practical
requirements, finding an efficient and effective alignment method has never been more …
requirements, finding an efficient and effective alignment method has never been more …
Targeted manipulation and deception emerge when optimizing llms for user feedback
As LLMs become more widely deployed, there is increasing interest in directly optimizing for
feedback from end users (eg thumbs up) in addition to feedback from paid annotators …
feedback from end users (eg thumbs up) in addition to feedback from paid annotators …