Google 학술 검색

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

저장 인용 3303회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]

[PDF] arxiv.org

Evaluating large language models: A comprehensive survey

Z Guo, R **, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

저장 인용 130회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] mit.edu

Benchmarking large language models for news summarization

T Zhang, F Ladhak, E Durmus, P Liang… - Transactions of the …, 2024 - direct.mit.edu

Large language models (LLMs) have shown promise for automatic summarization but the
reasons behind their successes are poorly understood. By conducting a human evaluation …

저장 인용 471회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] arxiv.org

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org

Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

[Free GPT-4]

[PDF] arxiv.org

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

저장 인용 474회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] acm.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2024 - dl.acm.org

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

저장 인용 110회 인용 관련 학술자료

[Free GPT-4]

[PDF] arxiv.org

News summarization and evaluation in the era of gpt-3

T Goyal, JJ Li, G Durrett - arxiv preprint arxiv:2209.12356, 2022 - arxiv.org

The recent success of zero-and few-shot prompting with models like GPT-3 has led to a
paradigm shift in NLP research. In this paper, we study its impact on text summarization …

저장 인용 404회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Chatgpt as a factual inconsistency evaluator for text summarization

Z Luo, Q **e, S Ananiadou - arxiv preprint arxiv:2303.15621, 2023 - arxiv.org

The performance of text summarization has been greatly boosted by pre-trained language
models. A main concern of existing methods is that most generated summaries are not …

저장 인용 165회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …

저장 인용 221회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

TRUE: Re-evaluating factual consistency evaluation

O Honovich, R Aharoni, J Herzig, H Taitelbaum… - arxiv preprint arxiv …, 2022 - arxiv.org

Grounded text generation systems often generate text that contains factual inconsistencies,
hindering their real-world applicability. Automatic factual consistency evaluation may help …

알림 만들기

인용

고급 검색

라이브러리에 저장됨

SummaC: Re-visiting NLI-based models for inconsistency detection in summarization

Survey of hallucination in natural language generation

Evaluating large language models: A comprehensive survey

Benchmarking large language models for news summarization

Holistic evaluation of language models

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

News summarization and evaluation in the era of gpt-3

Chatgpt as a factual inconsistency evaluator for text summarization

Towards a unified multi-dimensional evaluator for text generation

TRUE: Re-evaluating factual consistency evaluation