[HTML][HTML] Summary of chatgpt-related research and perspective towards the future of large language models

Y Liu, T Han, S Ma, J Zhang, Y Yang, J Tian, H He, A Li… - Meta-radiology, 2023 - Elsevier
This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4)
research, state-of-the-art large language models (LLM) from the GPT series, and their …

Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arxiv preprint arxiv:2402.01383, 2024 - arxiv.org
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

How good are gpt models at machine translation? a comprehensive evaluation

A Hendy, M Abdelrehim, A Sharaf, V Raunak… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for
natural language generation, but their performance for machine translation has not been …

Towards making the most of chatgpt for machine translation

K Peng, L Ding, Q Zhong, L Shen, X Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies
have shown that it achieves comparable results to commercial systems for high-resource …

Large language models are state-of-the-art evaluators of translation quality

T Kocmi, C Federmann - arxiv preprint arxiv:2302.14520, 2023 - arxiv.org
We describe GEMBA, a GPT-based metric for assessment of translation quality, which works
both with a reference translation and without. In our evaluation, we focus on zero-shot …

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

NM Guerreiro, R Rei, D Stigt, L Coheur… - Transactions of the …, 2024 - direct.mit.edu
Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt,
estimate the quality of a translation hypothesis by providing a single sentence-level score …

Error analysis prompting enables human-like translation evaluation in large language models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

Exploring human-like translation strategy with large language models

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu
Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …