- Academic Search

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Salva Cita Citato da 11 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Salva Cita Citato da 10 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Kv cache compression, but what must we give in return? a comprehensive benchmark of long context capable approaches

J Yuan, H Liu, S Zhong, YN Chuang, S Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Long context capability is a crucial competency for large language models (LLMs) as it
mitigates the human struggle to digest long-form texts. This capability enables complex task …

Salva Cita Citato da 9 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Y Wang, Z Zhang, J Wang, D Fan, Z Xu, L Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

In various video-language learning tasks, the challenge of achieving cross-modality
alignment with multi-grained data persists. We propose a method to tackle this challenge …

Salva Cita Articoli correlati Tutte e 3 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

DHP Benchmark: Are LLMs Good NLG Evaluators?

From generation to judgment: Opportunities and challenges of llm-as-a-judge

A Survey on LLM-as-a-Judge

Kv cache compression, but what must we give in return? a comprehensive benchmark of long context capable approaches

GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning