- Academic Search

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

保存引用被引用次数：11 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Llms-as-judges: a comprehensive survey on llm-based evaluation methods

H Li, Q Dong, J Chen, H Su, Y Zhou, Q Ai, Z Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has driven their expanding
application across various fields. One of the most promising applications is their role as …

保存引用被引用次数：2 相关文章 HTML 版

[Free GPT-4]

[PDF] arxiv.org

LegalAgentBench: Evaluating LLM Agents in Legal Domain

H Li, J Chen, J Yang, Q Ai, W Jia, Y Liu, K Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

With the increasing intelligence and autonomy of LLM agents, their potential applications in
the legal domain are becoming increasingly apparent. However, existing general-domain …

保存引用相关文章 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Calibraeval: Calibrating prediction distribution to mitigate selection bias in llms-as-judges

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Llms-as-judges: a comprehensive survey on llm-based evaluation methods

LegalAgentBench: Evaluating LLM Agents in Legal Domain