Enhancing the reasoning ability of multimodal large language models via mixed preference optimization
Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …
training process involving pre-training and supervised fine-tuning. However, these models …
Anchored preference optimization and contrastive revisions: Addressing underspecification in alignment
Large Language Models (LLMs) are often aligned using contrastive alignment objectives
and preference pair datasets. The interaction between model, paired data, and objective …
and preference pair datasets. The interaction between model, paired data, and objective …
Takin: A cohort of superior quality zero-shot speech generation models
S Chen, Y Feng, L He, T He, W He, Y Hu, B Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
With the advent of the big data and large language model era, zero-shot personalized rapid
customization has emerged as a significant trend. In this report, we introduce Takin …
customization has emerged as a significant trend. In this report, we introduce Takin …
Current and future state of evaluation of large language models for medical summarization tasks
E Croxford, Y Gao, N Pellegrino, K Wong, G Wills… - npj Health …, 2025 - nature.com
Abstract Large Language Models have expanded the potential for clinical Natural Language
Generation (NLG), presenting new opportunities to manage the vast amounts of medical …
Generation (NLG), presenting new opportunities to manage the vast amounts of medical …
Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment
Large language models (LLMs) are still struggling in aligning with human preference in
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …
Low-Redundant Optimization for Large Language Model Alignment
Large language models (LLMs) are still struggling in aligning with human preference in
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct
and robust alignment of Large Language Models (LLMs) with human preferences, offering a …
and robust alignment of Large Language Models (LLMs) with human preferences, offering a …
On Weaponization-Resistant Large Language Models with Prospect Theoretic Alignment
Z Cheng, M Zhang, J Sun, W Dai - Proceedings of the 31st …, 2025 - aclanthology.org
Large language models (LLMs) have made significant advancements, but their increasing
capabilities present serious risks of misuse, particularly in open-weight models where direct …
capabilities present serious risks of misuse, particularly in open-weight models where direct …
LLM Safety Alignment is Divergence Estimation in Disguise
We propose a theoretical framework demonstrating that popular Large Language Model
(LLM) alignment methods, including Reinforcement Learning from Human Feedback (RLHF) …
(LLM) alignment methods, including Reinforcement Learning from Human Feedback (RLHF) …
Empowering Community-Driven Determination of Values for Language Models
D Raman - 2024 - dspace.mit.edu
Emerging technologies like Artificial Intelligence and Large Language Models are often
developed in Western contexts and carry implicit values, from developer choices or …
developed in Western contexts and carry implicit values, from developer choices or …