Google Академія

J Schneider - Artificial Intelligence Review, 2024 - Springer

Generative AI (GenAI) represents a shift from AI's ability to “recognize” to its ability to
“generate” solutions for a wide range of tasks. As generated solutions and applications grow …

Зберегти Послатися Цитовано в 24 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Leveraging large language models for nlg evaluation: Advances and challenges

Z Li, X Xu, T Shen, C Xu, JC Gu, Y Lai, C Tao… - arxiv preprint arxiv …, 2024 - arxiv.org

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …

Зберегти Послатися Цитовано в 11 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Judging the judges: Evaluating alignment and vulnerabilities in llms-as-judges

AS Thakur, K Choudhary, VS Ramayapally… - arxiv preprint arxiv …, 2024 - arxiv.org

Offering a promising solution to the scalability challenges associated with human evaluation,
the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large …

Зберегти Послатися Цитовано в 38 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Superfiltering: Weak-to-strong data filtering for fast instruction-tuning

M Li, Y Zhang, S He, Z Li, H Zhao, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Instruction tuning is critical to improve LLMs but usually suffers from low-quality and
redundant data. Data filtering for instruction tuning has proved important in improving both …

Зберегти Послатися Цитовано в 37 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Зберегти Послатися Цитовано в 33 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Are LLM-based Evaluators Confusing NLG Quality Criteria?

X Hu, M Gao, S Hu, Y Zhang, Y Chen, T Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Some prior work has shown that LLMs perform well in NLG evaluation for different tasks.
However, we discover that LLMs seem to confuse different evaluation criteria, which reduces …

Зберегти Послатися Цитовано в 17 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Extending context window of large language models via semantic compression

W Fei, X Niu, P Zhou, L Hou, B Bai, L Deng… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformer-based Large Language Models (LLMs) often impose limitations on the length of
the text input to ensure the generation of fluent and relevant responses. This constraint …

Зберегти Послатися Цитовано в 19 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

T Chen, A Asai, N Mireshghallah, S Min… - arxiv preprint arxiv …, 2024 - arxiv.org

Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …

Зберегти Послатися Цитовано в 5 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards completeness-oriented tool retrieval for large language models

C Qu, S Dai, X Wei, H Cai, S Wang, D Yin, J Xu… - Proceedings of the 33rd …, 2024 - dl.acm.org

Recently, integrating external tools with Large Language Models (LLMs) has gained
significant attention as an effective strategy to mitigate the limitations inherent in their pre …

Зберегти Послатися Цитовано в 6 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking the roles of large language models in chinese grammatical error correction

Y Li, S Qin, H Huang, Y Li, L Qin, X Hu, W Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, Large Language Models (LLMs) have been widely studied by researchers for their
roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese …

Зберегти Послатися Цитовано в 12 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Evaluation metrics in the era of GPT-4: Reliably evaluating large language models on sequence...

Explainable generative ai (genxai): A survey, conceptualization, and research agenda

Leveraging large language models for nlg evaluation: Advances and challenges

Judging the judges: Evaluating alignment and vulnerabilities in llms-as-judges

Superfiltering: Weak-to-strong data filtering for fast instruction-tuning

A Survey on LLM-as-a-Judge

Are LLM-based Evaluators Confusing NLG Quality Criteria?

Extending context window of large language models via semantic compression

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

Towards completeness-oriented tool retrieval for large language models

Rethinking the roles of large language models in chinese grammatical error correction