محقق Google

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM computing …, 2023‏ - dl.acm.org‏

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …‏

ذخیره ارجاع بیان شده در 3397 یافته مقاله‌های مربوط تمام نسخه‌های 8

Evaluating large language models: A comprehensive survey‏

Z Guo, R **, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …‏

ذخیره ارجاع بیان شده در 135 یافته مقاله‌های مربوط تمام نسخه‌های 2 ذخیره‌شده

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions‏

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025‏ - dl.acm.org‏

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …‏

ذخیره ارجاع بیان شده در 945 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

G-eval: NLG evaluation using gpt-4 with better human alignment‏

Y Liu, D Iter, Y Xu, S Wang, R Xu, C Zhu - arxiv preprint arxiv:2303.16634, 2023‏ - arxiv.org‏

The quality of texts generated by natural language generation (NLG) systems is hard to
measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE …‏

ذخیره ارجاع بیان شده در 984 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation‏

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …‏

ذخیره ارجاع بیان شده در 486 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling large language models to generate text with citations‏

T Gao, H Yen, J Yu, D Chen - arxiv preprint arxiv:2305.14627, 2023‏ - arxiv.org‏

Large language models (LLMs) have emerged as a widely-used tool for information
seeking, but their generated outputs are prone to hallucination. In this work, our aim is to …‏

ذخیره ارجاع بیان شده در 255 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering‏

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …‏

ذخیره ارجاع بیان شده در 172 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models‏

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …‏

ذخیره ارجاع بیان شده در 1352 یافته مقاله‌های مربوط تمام نسخه‌های 15 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rarr: Researching and revising what language models say, using language models‏

L Gao, Z Dai, P Pasupat, A Chen, AT Chaganty… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Language models (LMs) now excel at many tasks such as few-shot learning, question
answering, reasoning, and dialog. However, they sometimes generate unsupported or …‏

ذخیره ارجاع بیان شده در 260 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation‏

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …‏

ذخیره ارجاع بیان شده در 222 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Asking and answering questions to evaluate the factual consistency of summaries

Survey of hallucination in natural language generation‏

Evaluating large language models: A comprehensive survey‏

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions‏

G-eval: NLG evaluation using gpt-4 with better human alignment‏

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation‏

Enabling large language models to generate text with citations‏

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering‏

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models‏

Rarr: Researching and revising what language models say, using language models‏

Towards a unified multi-dimensional evaluator for text generation‏