Explainable generative ai (genxai): A survey, conceptualization, and research agenda

J Schneider - Artificial Intelligence Review, 2024 - Springer
Generative AI (GenAI) represents a shift from AI's ability to “recognize” to its ability to
“generate” solutions for a wide range of tasks. As generated solutions and applications grow …

Leveraging large language models for nlg evaluation: Advances and challenges

Z Li, X Xu, T Shen, C Xu, JC Gu, Y Lai, C Tao… - arxiv preprint arxiv …, 2024 - arxiv.org
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …

Judging the judges: Evaluating alignment and vulnerabilities in llms-as-judges

AS Thakur, K Choudhary, VS Ramayapally… - arxiv preprint arxiv …, 2024 - arxiv.org
Offering a promising solution to the scalability challenges associated with human evaluation,
the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large …

Superfiltering: Weak-to-strong data filtering for fast instruction-tuning

M Li, Y Zhang, S He, Z Li, H Zhao, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction tuning is critical to improve LLMs but usually suffers from low-quality and
redundant data. Data filtering for instruction tuning has proved important in improving both …

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Are LLM-based Evaluators Confusing NLG Quality Criteria?

X Hu, M Gao, S Hu, Y Zhang, Y Chen, T Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks.
However, we discover that LLMs seem to confuse different evaluation criteria, which reduces …

Extending context window of large language models via semantic compression

W Fei, X Niu, P Zhou, L Hou, B Bai, L Deng… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformer-based Large Language Models (LLMs) often impose limitations on the length of
the text input to ensure the generation of fluent and relevant responses. This constraint …

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

T Chen, A Asai, N Mireshghallah, S Min… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …

Towards completeness-oriented tool retrieval for large language models

C Qu, S Dai, X Wei, H Cai, S Wang, D Yin, J Xu… - Proceedings of the 33rd …, 2024 - dl.acm.org
Recently, integrating external tools with Large Language Models (LLMs) has gained
significant attention as an effective strategy to mitigate the limitations inherent in their pre …

Rethinking the roles of large language models in chinese grammatical error correction

Y Li, S Qin, H Huang, Y Li, L Qin, X Hu, W Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, Large Language Models (LLMs) have been widely studied by researchers for their
roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese …