Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Merge, ensemble, and cooperate! a survey on collaborative strategies in the era of large language models

J Lu, Z Pang, M **ao, Y Zhu, R **a, J Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable success of Large Language Models (LLMs) has ushered natural language
processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained …

Reinforcement Learning Enhanced LLMs: A Survey

S Wang, S Zhang, J Zhang, R Hu, X Li, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

W **ao, Z Wang, L Gan, S Zhao, W He, LA Tuan… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid advancement of large language models (LLMs), aligning policy models with
human preferences has become increasingly critical. Direct Preference Optimization (DPO) …

Eliminating biased length reliance of direct preference optimization via down-sampled kl divergence

J Lu, J Li, S An, M Zhao, Y He, D Yin, X Sun - arxiv preprint arxiv …, 2024 - arxiv.org
Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct
and robust alignment of Large Language Models (LLMs) with human preferences, offering a …

From lists to emojis: How format bias affects model alignment

X Zhang, W **ong, L Chen, T Zhou, H Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we study format biases in reinforcement learning from human feedback
(RLHF). We observe that many widely-used preference models, including human …

Aqulia-med llm: Pioneering full-process open-source medical language models

L Zhao, W Zeng, X Shi, H Zhou, D Hao, Y Lin - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, both closed-source LLMs and open-source communities have made significant
strides, outperforming humans in various general domains. However, their performance in …

Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts

T Fu, Y Hou, J McAuley, R Yan - arxiv preprint arxiv:2408.05094, 2024 - arxiv.org
The task of multi-objective alignment aims at balancing and controlling the different
alignment objectives (eg, helpfulness, harmlessness and honesty) of large language models …

Superficial safety alignment hypothesis

J Li, JE Kim - arxiv preprint arxiv:2410.10862, 2024 - arxiv.org
As large language models (LLMs) are overwhelmingly more and more integrated into
various applications, ensuring they generate safe and aligned responses is a pressing …

BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization

G Lee, M Jeong, Y Kim, H Jung, J Oh, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
While learning to align Large Language Models (LLMs) with human preferences has shown
remarkable success, aligning these models to meet the diverse user preferences presents …