Justice or prejudice? quantifying biases in llm-as-a-judge

J Ye, Y Wang, Y Huang, D Chen, Q Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks
and served as supervised rewards in model training. However, despite their excellence in …

Agent Laboratory: Using LLM Agents as Research Assistants

S Schmidgall, Y Su, Z Wang, X Sun, J Wu, X Yu… - arxiv preprint arxiv …, 2025 - arxiv.org
Historically, scientific discovery has been a lengthy and costly process, demanding
substantial time and resources from initial conception to final results. To accelerate scientific …