Google Akademik

W Xu, D Wang, L Pan, Z Song, M Freitag… - ar** image captioning models. Recent data-driven metrics …

Kaydet Alıntı yap Alıntılanma sayısı: 19 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

SESCORE2: Learning text generation evaluation via synthesizing realistic mistakes

W Xu, X Qian, M Wang, L Li, WY Wang - arxiv preprint arxiv:2212.09305, 2022 - arxiv.org

Is it possible to train a general metric for evaluating text generation quality without human
annotated ratings? Existing learned metrics either perform unsatisfactorily across text …

Kaydet Alıntı yap Alıntılanma sayısı: 11 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multilingual conceptual coverage in text-to-image models

M Saxon, WY Wang - arxiv preprint arxiv:2306.01735, 2023 - arxiv.org

We propose" Conceptual Coverage Across Languages"(CoCo-CroLa), a technique for
benchmarking the degree to which any generative text-to-image system provides …

Kaydet Alıntı yap Alıntılanma sayısı: 12 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A review of faithfulness metrics for hallucination assessment in Large Language Models

B Malin, T Kalganova, N Boulgouris - arxiv preprint arxiv:2501.00269, 2024 - arxiv.org

This review examines the means with which faithfulness has been evaluated across open-
ended summarization, question-answering and machine translation tasks. We find that the …

Kaydet Alıntı yap İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards fine-grained information: Identifying the type and location of translation errors

K Bao, Y Wan, D Liu, B Yang, W Lei, X He… - arxiv preprint arxiv …, 2023 - arxiv.org

Fine-grained information on translation errors is helpful for the translation evaluation
community. Existing approaches can not synchronously consider error position and type …

Kaydet Alıntı yap Alıntılanma sayısı: 6 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems

C Dandekar, W Xu, X Xu, S Ouyang, L Li - arxiv preprint arxiv:2410.10861, 2024 - arxiv.org

With the rapid advancement of machine translation research, evaluation toolkits have
become essential for benchmarking system progress. Tools like COMET and SacreBLEU …

Kaydet Alıntı yap İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

Kaydet Alıntı yap Alıntılanma sayısı: 16 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Not all errors are equal: Learning text generation metrics using stratified error synthesis

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

SESCORE2: Learning text generation evaluation via synthesizing realistic mistakes

Multilingual conceptual coverage in text-to-image models

A review of faithfulness metrics for hallucination assessment in Large Language Models

Towards fine-grained information: Identifying the type and location of translation errors

Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models