From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Generative language models exhibit social identity biases

T Hu, Y Kyrychenko, S Rathje, N Collier… - Nature Computational …, 2025 - nature.com
Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity)
and derogate other groups (outgroup hostility), are deeply rooted in human psychology and …

Reinforcement Learning Enhanced LLMs: A Survey

S Wang, S Zhang, J Zhang, R Hu, X Li, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …

Thinking llms: General instruction following with thought generation

T Wu, J Lan, W Yuan, J Jiao, J Weston… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs are typically trained to answer user questions or follow instructions similarly to how
human experts respond. However, in the standard alignment framework they lack the basic …

Preference tuning with human feedback on language, speech, and vision tasks: A survey

GI Winata, H Zhao, A Das, W Tang, DD Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Preference tuning is a crucial process for aligning deep generative models with human
preferences. This survey offers a thorough overview of recent advancements in preference …

Lmunit: Fine-grained evaluation with natural language unit tests

J Saad-Falcon, R Vivek, W Berrios, NS Naik… - arxiv preprint arxiv …, 2024 - arxiv.org
As language models become integral to critical workflows, assessing their behavior remains
a fundamental challenge--human evaluation is costly and noisy, while automated metrics …

How Reliable Is Human Feedback For Aligning Large Language Models?

MH Yeh, L Tao, J Wang, X Du, Y Li - arxiv preprint arxiv:2410.01957, 2024 - arxiv.org
Most alignment research today focuses on designing new learning algorithms using
datasets like Anthropic-HH, assuming human feedback data is inherently reliable. However …

Fanar: An Arabic-Centric Multimodal Generative AI Platform

F Team, U Abbas, MS Ahmad, F Alam… - arxiv preprint arxiv …, 2025 - arxiv.org
We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that
supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star …

Cross-lingual Transfer of Reward Models in Multilingual Alignment

J Hong, N Lee, R Martínez-Castaño… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement learning with human feedback (RLHF) is shown to largely benefit from
precise reward models (RMs). However, recent studies in reward modeling schemes are …