Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Llm-based nlg evaluation: Current status and challenges
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …
xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt,
estimate the quality of a translation hypothesis by providing a single sentence-level score …
estimate the quality of a translation hypothesis by providing a single sentence-level score …
Error analysis prompting enables human-like translation evaluation in large language models
Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …
proficiency across several NLP tasks, such as machine translation, text summarization …
Adapting large language models for document-level machine translation
Large language models (LLMs) have significantly advanced various natural language
processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often …
processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often …
Llamax: Scaling linguistic horizons of llm by enhancing translation capabilities beyond 100 languages
Large Language Models (LLMs) demonstrate remarkable translation capabilities in high-
resource language tasks, yet their performance in low-resource languages is hindered by …
resource language tasks, yet their performance in low-resource languages is hindered by …
Navigating the metrics maze: Reconciling score magnitudes and accuracies
Ten years ago a single metric, BLEU, governed progress in machine translation research.
For better or worse, there is no such consensus today, and consequently it is difficult for …
For better or worse, there is no such consensus today, and consequently it is difficult for …
Tear: Improving llm-based machine translation with systematic self-refinement
Large Language Models (LLMs) have achieved impressive results in Machine Translation
(MT). However, careful evaluations by human reveal that the translations produced by LLMs …
(MT). However, careful evaluations by human reveal that the translations produced by LLMs …
Assessing the Role of Context in Chat Translation Evaluation: Is Context Helpful and Under What Conditions?
Despite the recent success of automatic metrics for assessing translation quality, their
application in evaluating the quality of machine-translated chats has been limited. Unlike …
application in evaluating the quality of machine-translated chats has been limited. Unlike …
Machine translation meta evaluation through translation accuracy challenge sets
Recent machine translation (MT) metrics calibrate their effectiveness by correlating with
human judgment. However, these results are often obtained by averaging predictions across …
human judgment. However, these results are often obtained by averaging predictions across …
Prexme! large scale prompt exploration of open source llms for machine translation and summarization evaluation
Large language models (LLMs) have revolutionized the field of NLP. Notably, their in-
context learning capabilities also enable their use as evaluation metrics for natural language …
context learning capabilities also enable their use as evaluation metrics for natural language …