Google Académico

NM Guerreiro, DM Alves, J Waldendorf… - Transactions of the …, 2023 - direct.mit.edu

Hallucinated translations can severely undermine and raise safety issues when machine
translation systems are deployed in the wild. Previous research on the topic focused on …

Guardar Citar Citado por 132 Artículos relacionados Las 11 versiones

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

Guardar Citar Citado por 233 Artículos relacionados Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Exploring human-like translation strategy with large language models

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu

Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …

Guardar Citar Citado por 85 Artículos relacionados Las 9 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

P Fernandes, D Deutsch, M Finkelstein, P Riley… - arxiv preprint arxiv …, 2023 - arxiv.org

Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative
development of MT systems. While considerable progress has been made on estimating a …

Guardar Citar Citado por 61 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

xcomet: Transparent machine translation evaluation through fine-grained error detection

NM Guerreiro, R Rei, D van Stigt, L Coheur… - arxiv preprint arxiv …, 2023 - arxiv.org

Widely used learned metrics for machine translation evaluation, such as COMET and
BLEURT, estimate the quality of a translation hypothesis by providing a single sentence …

Guardar Citar Citado por 66 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

D Yang, Q ** - arxiv preprint arxiv:2408.14622, 2024 - arxiv.org

With the development of artificial intelligence, particularly the success of Large Language
Models (LLMs), the quantity and quality of automatically generated stories have significantly …

Guardar Citar Citado por 3 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Tigerscore: Towards building explainable metric for all text generation tasks

D Jiang, Y Li, G Zhang, W Huang, BY Lin… - … on Machine Learning …, 2023 - openreview.net

We present TIGERScore, a\textbf {T} rained metric that follows\textbf {I} nstruction\textbf {G}
uidance to perform\textbf {E} xplainable, and\textbf {R} eference-free evaluation over a wide …

Guardar Citar Citado por 39 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

Guardar Citar Citado por 24 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient benchmarking (of language models)

Y Perlitz, E Bandel, A Gera, O Arviv, L Ein-Dor… - arxiv preprint arxiv …, 2023 - arxiv.org

The increasing versatility of language models LMs has given rise to a new class of
benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks …

Guardar Citar Citado por 27 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The inside story: Towards better understanding of machine translation neural evaluation metrics

R Rei, NM Guerreiro, M Treviso, L Coheur… - arxiv preprint arxiv …, 2023 - arxiv.org

Neural metrics for machine translation evaluation, such as COMET, exhibit significant
improvements in their correlation with human judgments, as compared to traditional metrics …

Guardar Citar Citado por 15 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

CometKiwi: IST-unbabel 2022 submission for the quality estimation shared task

Hallucinations in large multilingual translation models

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

Exploring human-like translation strategy with large language models

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

xcomet: Transparent machine translation evaluation through fine-grained error detection

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

Tigerscore: Towards building explainable metric for all text generation tasks

Findings of the WMT 2023 shared task on quality estimation

Efficient benchmarking (of language models)

The inside story: Towards better understanding of machine translation neural evaluation metrics