Llms-as-judges: a comprehensive survey on llm-based evaluation methods

H Li, Q Dong, J Chen, H Su, Y Zhou, Q Ai, Z Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has driven their expanding
application across various fields. One of the most promising applications is their role as …

Can LLMs replace manual annotation of software engineering artifacts?

T Ahmed, P Devanbu, C Treude, M Pradel - arxiv preprint arxiv …, 2024 - arxiv.org
Experimental evaluations of software engineering innovations, eg, tools and processes,
often include human-subject studies as a component of a multi-pronged strategy to obtain …

Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

G He, G Demartini, U Gadiraju - arxiv preprint arxiv:2502.01390, 2025 - arxiv.org
Since the explosion in popularity of ChatGPT, large language models (LLMs) have
continued to impact our everyday lives. Equipped with external tools that are designed for a …

Assessing empathy in large language models with real-world physician-patient interactions

M Luo, CJ Warren, L Cheng… - … Conference on Big …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs) into the healthcare domain has the
potential to significantly enhance patient care and support through the development of …

Personas with Attitudes: Controlling LLMs for Diverse Data Annotation

L Fröhling, G Demartini, D Assenmacher - arxiv preprint arxiv:2410.11745, 2024 - arxiv.org
We present a novel approach for enhancing diversity and control in data annotation tasks by
personalizing large language models (LLMs). We investigate the impact of injecting diverse …

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models

T Thonet, J Rozen, L Besacier - arxiv preprint arxiv:2403.20262, 2024 - arxiv.org
Research on Large Language Models (LLMs) has recently witnessed an increasing interest
in extending the models' context size to better capture dependencies within long documents …

Using LLMs to establish implicit user sentiment of software desirability

S Weitl-Harms, JD Hastings, J Lum - arxiv preprint arxiv:2408.01527, 2024 - arxiv.org
This study explores the use of LLMs for providing quantitative zero-shot sentiment analysis
of implicit software desirability, addressing a critical challenge in product evaluation where …

PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing

Y Zhang, Z **, Y **ng, G Li, F Liu, J Zhu… - ACM Transactions on …, 2025 - dl.acm.org
Bug fixing holds significant importance in software development and maintenance. Recent
research has made substantial strides in exploring the potential of large language models …

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

Y Wang, D Stevens, P Shah, W Jiang, M Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The growing demand for AI training data has transformed data annotation into a global
industry, but traditional approaches relying on human annotators are often time-consuming …