Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
How to evaluate machine translation: A review of automated and human metrics
E Chatzikoumi - Natural Language Engineering, 2020 - cambridge.org
This article presents the most up-to-date, influential automated, semiautomated and human
metrics used to evaluate the quality of machine translation (MT) output and provides the …
metrics used to evaluate the quality of machine translation (MT) output and provides the …
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Large language models (LLMs) are competitive with the state of the art on a wide range of
sentence-level translation datasets. However, their ability to translate paragraphs and …
sentence-level translation datasets. However, their ability to translate paragraphs and …
The Eval4NLP shared task on explainable quality estimation: Overview and results
In this paper, we introduce the Eval4NLP-2021shared task on explainable quality
estimation. Given a source-translation pair, this shared task requires not only to provide a …
estimation. Given a source-translation pair, this shared task requires not only to provide a …
Error classification and analysis for machine translation quality assessment
M Popović - Translation quality assessment: From principles to …, 2018 - Springer
This chapter presents an overview of different approaches and tasks related to classification
and analysis of errors in machine translation (MT) output. Manual error classification is a …
and analysis of errors in machine translation (MT) output. Manual error classification is a …
[PDF][PDF] Fine-grained human evaluation of neural versus phrase-based machine translation
We compare three approaches to statistical machine translation (pure phrase-based,
factored phrase-based and neural) by performing a fine-grained manual evaluation via error …
factored phrase-based and neural) by performing a fine-grained manual evaluation via error …
[PDF][PDF] How far are we from fully automatic high quality grammatical error correction?
In this paper, we first explore the role of inter-annotator agreement statistics in grammatical
error correction and conclude that they are less informative in fields where there may be …
error correction and conclude that they are less informative in fields where there may be …
Gpt-4 vs. human translators: A comprehensive evaluation of translation quality across languages, domains, and expertise levels
This study comprehensively evaluates the translation quality of Large Language Models
(LLMs), specifically GPT-4, against human translators of varying expertise levels across …
(LLMs), specifically GPT-4, against human translators of varying expertise levels across …
Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian
This paper presents a quantitative fine-grained manual evaluation approach to comparing
the performance of different machine translation (MT) systems. We build upon the well …
the performance of different machine translation (MT) systems. We build upon the well …
Agreement is overrated: A plea for correlation to assess human evaluation reliability
Inter-Annotator Agreement (IAA) is used as a means of assessing the quality of NLG
evaluation data, in particular, its reliability. According to existing scales of IAA interpretation …
evaluation data, in particular, its reliability. According to existing scales of IAA interpretation …
Evaluation methodologies in automatic question generation 2013-2018
In the last few years Automatic Question Generation (AQG) has attracted increasing interest.
In this paper we survey the evaluation methodologies used in AQG. Based on a sample of …
In this paper we survey the evaluation methodologies used in AQG. Based on a sample of …