From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Kv cache compression, but what must we give in return? a comprehensive benchmark of long context capable approaches

J Yuan, H Liu, S Zhong, YN Chuang, S Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Long context capability is a crucial competency for large language models (LLMs) as it
mitigates the human struggle to digest long-form texts. This capability enables complex task …

GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Y Wang, Z Zhang, J Wang, D Fan, Z Xu, L Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
In various video-language learning tasks, the challenge of achieving cross-modality
alignment with multi-grained data persists. We propose a method to tackle this challenge …