Študovňa Google

M Gao, X Hu, J Ruan, X Pu, X Wan - arxiv preprint arxiv:2402.01383, 2024 - arxiv.org

Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

Uložiť Citovať Citované 80-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

NM Guerreiro, R Rei, D Stigt, L Coheur… - Transactions of the …, 2024 - direct.mit.edu

Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt,
estimate the quality of a translation hypothesis by providing a single sentence-level score …

Uložiť Citovať Citované 70-krát Súvisiace články Všetky verzie 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Error analysis prompting enables human-like translation evaluation in large language models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

Uložiť Citovať Citované 153-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adapting large language models for document-level machine translation

M Wu, TT Vu, L Qu, G Foster, G Haffari - arxiv preprint arxiv:2401.06468, 2024 - arxiv.org

Large language models (LLMs) have significantly advanced various natural language
processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often …

Uložiť Citovať Citované 32-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llamax: Scaling linguistic horizons of llm by enhancing translation capabilities beyond 100 languages

Y Lu, W Zhu, L Li, Y Qiao, F Yuan - arxiv preprint arxiv:2407.05975, 2024 - arxiv.org

Large Language Models (LLMs) demonstrate remarkable translation capabilities in high-
resource language tasks, yet their performance in low-resource languages is hindered by …

Uložiť Citovať Citované 21-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Navigating the metrics maze: Reconciling score magnitudes and accuracies

T Kocmi, V Zouhar, C Federmann, M Post - arxiv preprint arxiv …, 2024 - arxiv.org

Ten years ago a single metric, BLEU, governed progress in machine translation research.
For better or worse, there is no such consensus today, and consequently it is difficult for …

Uložiť Citovať Citované 20-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tear: Improving llm-based machine translation with systematic self-refinement

Z Feng, Y Zhang, H Li, B Wu, J Liao, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have achieved impressive results in Machine Translation
(MT). However, careful evaluations by human reveal that the translations produced by LLMs …

Uložiť Citovať Citované 6-krát Súvisiace články HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

S Agrawal, A Farajian, P Fernandes, R Rei… - Transactions of the …, 2024 - direct.mit.edu

Despite the recent success of automatic metrics for assessing translation quality, their
application in evaluating the quality of machine-translated chats has been limited. Unlike …

Uložiť Citovať Citované 1-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Machine translation meta evaluation through translation accuracy challenge sets

N Moghe, A Fazla, C Amrhein, T Kocmi… - Computational …, 2024 - direct.mit.edu

Recent machine translation (MT) metrics calibrate their effectiveness by correlating with
human judgment. However, these results are often obtained by averaging predictions across …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prexme! large scale prompt exploration of open source llms for machine translation and summarization evaluation

C Leiter, S Eger - arxiv preprint arxiv:2406.18528, 2024 - arxiv.org

Large language models (LLMs) have revolutionized the field of NLP. Notably, their in-
context learning capabilities also enable their use as evaluation metrics for natural language …

Uložiť Citovať Citované 3-krát Súvisiace články Všetky verzie 3 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

GEMBA-MQM: Detecting translation quality error spans with GPT-4

Llm-based nlg evaluation: Current status and challenges

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

Error analysis prompting enables human-like translation evaluation in large language models

Adapting large language models for document-level machine translation

Llamax: Scaling linguistic horizons of llm by enhancing translation capabilities beyond 100 languages

Navigating the metrics maze: Reconciling score magnitudes and accuracies

Tear: Improving llm-based machine translation with systematic self-refinement

Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?

Machine translation meta evaluation through translation accuracy challenge sets

Prexme! large scale prompt exploration of open source llms for machine translation and summarization evaluation