Large language models effectively leverage document-level context for literary translation, but critical errors persist

M Karpinska, M Iyyer - arxiv preprint arxiv:2304.03245, 2023 - arxiv.org
Large language models (LLMs) are competitive with the state of the art on a wide range of
sentence-level translation datasets. However, their ability to translate paragraphs and …

Adapting large language models for document-level machine translation

M Wu, TT Vu, L Qu, G Foster, G Haffari - arxiv preprint arxiv:2401.06468, 2024 - arxiv.org
Large language models (LLMs) have significantly advanced various natural language
processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often …

Investigating the translation performance of a large multilingual language model: the case of bloom

R Bawden, F Yvon - arxiv preprint arxiv:2303.01911, 2023 - arxiv.org
The NLP community recently saw the release of a new large open-access multilingual
language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on …

DiscoScore: Evaluating text generation with BERT and discourse coherence

W Zhao, M Strube, S Eger - arxiv preprint arxiv:2201.11176, 2022 - arxiv.org
Recently, there has been a growing interest in designing text generation systems from a
discourse coherence perspective, eg, modeling the interdependence between sentences …

Measuring and increasing context usage in context-aware machine translation

P Fernandes, K Yin, G Neubig, AFT Martins - arxiv preprint arxiv …, 2021 - arxiv.org
Recent work in neural machine translation has demonstrated both the necessity and
feasibility of using inter-sentential context--context from sentences other than those currently …

A survey on zero pronoun translation

L Wang, S Liu, M Xu, L Song, S Shi, Z Tu - arxiv preprint arxiv:2305.10196, 2023 - arxiv.org
Zero pronouns (ZPs) are frequently omitted in pro-drop languages (eg Chinese, Hungarian,
and Hindi), but should be recalled in non-pro-drop languages (eg English). This …

Clarify when necessary: Resolving ambiguity through interaction with lms

MJQ Zhang, E Choi - arxiv preprint arxiv:2311.09469, 2023 - arxiv.org
Resolving ambiguities through interaction is a hallmark of natural language, and modeling
this behavior is a core challenge in crafting AI assistants. In this work, we study such …

Embarrassingly easy document-level MT metrics: How to convert any pretrained metric into a document-level metric

G Vernikos, B Thompson, P Mathur… - arxiv preprint arxiv …, 2022 - arxiv.org
We hypothesize that existing sentence-level machine translation (MT) metrics become less
effective when the human reference contains ambiguities. To verify this hypothesis, we …

Gender neutralization for an inclusive machine translation: from theoretical foundations to open challenges

A Piergentili, D Fucci, B Savoldi, L Bentivogli… - arxiv preprint arxiv …, 2023 - arxiv.org
Gender inclusivity in language technologies has become a prominent research topic. In this
study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal …

When does translation require context? a data-driven, multilingual exploration

P Fernandes, K Yin, E Liu, AFT Martins… - arxiv preprint arxiv …, 2021 - arxiv.org
Although proper handling of discourse significantly contributes to the quality of machine
translation (MT), these improvements are not adequately measured in common translation …