From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Llms-as-judges: a comprehensive survey on llm-based evaluation methods

H Li, Q Dong, J Chen, H Su, Y Zhou, Q Ai, Z Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has driven their expanding
application across various fields. One of the most promising applications is their role as …

LegalAgentBench: Evaluating LLM Agents in Legal Domain

H Li, J Chen, J Yang, Q Ai, W Jia, Y Liu, K Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
With the increasing intelligence and autonomy of LLM agents, their potential applications in
the legal domain are becoming increasingly apparent. However, existing general-domain …