- Academic Search

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Gem Citer Citeret af 480 Relaterede artikler Alle 9 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Evaluating correctness and faithfulness of instruction-following models for question answering

V Adlakha, P BehnamGhader, XH Lu… - Transactions of the …, 2024 - direct.mit.edu

Instruction-following models are attractive alternatives to fine-tuned approaches for question
answering (QA). By simply prepending relevant documents and an instruction to their input …

Gem Citer Citeret af 118 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language model alignment: A survey

T Shen, R **, Y Huang, C Liu, W Dong, Z Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

Gem Citer Citeret af 161 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Interpretable long-form legal question answering with retrieval-augmented large language models

A Louis, G van Dijck, G Spanakis - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Many individuals are likely to face a legal dispute at some point in their lives, but their lack of
understanding of how to navigate these complex issues often renders them vulnerable. The …

Gem Citer Citeret af 59 Relaterede artikler Alle 6 versioner Bibliotekssøgning Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Expertqa: Expert-curated questions and attributed answers

C Malaviya, S Lee, S Chen, E Sieber, M Yatskar… - arxiv preprint arxiv …, 2023 - arxiv.org

As language models are adopted by a more sophisticated and diverse set of users, the
importance of guaranteeing that they provide factually correct information supported by …

Gem Citer Citeret af 68 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prd: Peer rank and discussion improve large language model based evaluations

R Li, T Patel, X Du - arxiv preprint arxiv:2307.02762, 2023 - arxiv.org

Nowadays, the quality of responses generated by different modern large language models
(LLMs) is hard to evaluate and compare automatically. Recent studies suggest and …

Gem Citer Citeret af 73 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating very long-term conversational memory of llm agents

A Maharana, DH Lee, S Tulyakov, M Bansal… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing works on long-term open-domain dialogues focus on evaluating model responses
within contexts spanning no more than five chat sessions. Despite advancements in long …

Gem Citer Citeret af 37 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The responsible foundation model development cheatsheet: A review of tools & resources

S Longpre, S Biderman, A Albalak… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation model development attracts a rapidly expanding body of contributors, scientists,
and applications. To help shape responsible development practices, we introduce the …

Gem Citer Citeret af 7 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

CRAG-comprehensive RAG benchmark

X Yang, K Sun, H **n, Y Sun, N Bhalla… - Advances in …, 2025 - proceedings.neurips.cc

Abstract Retrieval-Augmented Generation (RAG) has recently emerged as a promising
solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing …

Gem Citer Citeret af 15 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org

Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Gem Citer Citeret af 5 Relaterede artikler Alle 3 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

A critical evaluation of evaluations for long-form question answering

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

Evaluating correctness and faithfulness of instruction-following models for question answering

Large language model alignment: A survey

Interpretable long-form legal question answering with retrieval-augmented large language models

Expertqa: Expert-curated questions and attributed answers

Prd: Peer rank and discussion improve large language model based evaluations

Evaluating very long-term conversational memory of llm agents

The responsible foundation model development cheatsheet: A review of tools & resources

CRAG-comprehensive RAG benchmark

Benchmark evaluations, applications, and challenges of large vision language models: A survey