Google Acadèmic

NM Guerreiro, DM Alves, J Waldendorf… - Transactions of the …, 2023 - direct.mit.edu

Hallucinated translations can severely undermine and raise safety issues when machine
translation systems are deployed in the wild. Previous research on the topic focused on …

Desa Cita Citat per 132 Articles relacionats Totes les 11 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

Desa Cita Citat per 233 Articles relacionats Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Exploring human-like translation strategy with large language models

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu

Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …

Desa Cita Citat per 85 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

P Fernandes, D Deutsch, M Finkelstein, P Riley… - arxiv preprint arxiv …, 2023 - arxiv.org

Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative
development of MT systems. While considerable progress has been made on estimating a …

Desa Cita Citat per 61 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

xcomet: Transparent machine translation evaluation through fine-grained error detection

NM Guerreiro, R Rei, D van Stigt, L Coheur… - arxiv preprint arxiv …, 2023 - arxiv.org

Widely used learned metrics for machine translation evaluation, such as COMET and
BLEURT, estimate the quality of a translation hypothesis by providing a single sentence …

Desa Cita Citat per 66 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

D Yang, Q ** - arxiv preprint arxiv:2408.14622, 2024 - arxiv.org

With the development of artificial intelligence, particularly the success of Large Language
Models (LLMs), the quantity and quality of automatically generated stories have significantly …

Desa Cita Citat per 3 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Tigerscore: Towards building explainable metric for all text generation tasks

D Jiang, Y Li, G Zhang, W Huang, BY Lin… - … on Machine Learning …, 2023 - openreview.net

We present TIGERScore, a\textbf {T} rained metric that follows\textbf {I} nstruction\textbf {G}
uidance to perform\textbf {E} xplainable, and\textbf {R} eference-free evaluation over a wide …

Desa Cita Citat per 39 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

Desa Cita Citat per 24 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient benchmarking (of language models)

Y Perlitz, E Bandel, A Gera, O Arviv, L Ein-Dor… - arxiv preprint arxiv …, 2023 - arxiv.org

The increasing versatility of language models LMs has given rise to a new class of
benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks …

Desa Cita Citat per 27 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The inside story: Towards better understanding of machine translation neural evaluation metrics

R Rei, NM Guerreiro, M Treviso, L Coheur… - arxiv preprint arxiv …, 2023 - arxiv.org

Neural metrics for machine translation evaluation, such as COMET, exhibit significant
improvements in their correlation with human judgments, as compared to traditional metrics …

Desa Cita Citat per 15 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

CometKiwi: IST-unbabel 2022 submission for the quality estimation shared task

Hallucinations in large multilingual translation models

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

Exploring human-like translation strategy with large language models

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

xcomet: Transparent machine translation evaluation through fine-grained error detection

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

Tigerscore: Towards building explainable metric for all text generation tasks

Findings of the WMT 2023 shared task on quality estimation

Efficient benchmarking (of language models)

The inside story: Towards better understanding of machine translation neural evaluation metrics