Pre-trained language models in medicine: A survey

X Luo, Z Deng, B Yang, MY Luo - Artificial Intelligence in Medicine, 2024 - Elsevier
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language
Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various …

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

NM Guerreiro, R Rei, D Stigt, L Coheur… - Transactions of the …, 2024 - direct.mit.edu
Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt,
estimate the quality of a translation hypothesis by providing a single sentence-level score …

Error analysis prompting enables human-like translation evaluation in large language models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

System combination via quality estimation for grammatical error correction

MR Qorib, HT Ng - arxiv preprint arxiv:2310.14947, 2023 - arxiv.org
Quality estimation models have been developed to assess the corrections made by
grammatical error correction (GEC) models when the reference or gold-standard corrections …

Instructscore: Explainable text generation evaluation with finegrained feedback

W Xu, D Wang, L Pan, Z Song, M Freitag… - arxiv preprint arxiv …, 2023 - arxiv.org
Automatically evaluating the quality of language generation is critical. Although recent
learned metrics show high correlation with human judgement, these metrics can not explain …

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

P Fernandes, D Deutsch, M Finkelstein, P Riley… - arxiv preprint arxiv …, 2023 - arxiv.org
Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative
development of MT systems. While considerable progress has been made on estimating a …

Understanding and detecting hallucinations in neural machine translation via model introspection

W Xu, S Agrawal, E Briakou, MJ Martindale… - Transactions of the …, 2023 - direct.mit.edu
Neural sequence generation models are known to “hallucinate”, by producing outputs that
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …

Efficient benchmarking of language models

Y Perlitz, E Bandel, A Gera, O Arviv, L Ein-Dor… - arxiv preprint arxiv …, 2023 - arxiv.org
The increasing versatility of language models (LMs) has given rise to a new class of
benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks …

Towards making the most of llm for translation quality estimation

H Huang, S Wu, X Liang, B Wang, Y Shi, P Wu… - … Conference on Natural …, 2023 - Springer
Abstract Machine Translation Quality Estimation (QE) aims to evaluate the quality of
machine translation without relying on references. Recently, Large-scale Language Model …