Bringing order into the realm of Transformer-based language models for artificial intelligence and law

CM Greco, A Tagarelli - Artificial Intelligence and Law, 2024 - Springer
Transformer-based language models (TLMs) have widely been recognized to be a cutting-
edge technology for the successful development of deep-learning-based solutions to …

[PDF][PDF] Retrieval-augmented generation for large language models: A survey

Y Gao, Y **ong, X Gao, K Jia, J Pan, Y Bi… - arxiv preprint arxiv …, 2023 - simg.baai.ac.cn
Large language models (LLMs) demonstrate powerful capabilities, but they still face
challenges in practical applications, such as hallucinations, slow knowledge updates, and …

Lift yourself up: Retrieval-augmented text generation with self-memory

X Cheng, D Luo, X Chen, L Liu… - Advances in Neural …, 2023 - proceedings.neurips.cc
With direct access to human-written reference as memory, retrieval-augmented generation
has achieved much progress in a wide range of text generation tasks. Since better memory …

Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

Lexically constrained decoding for sequence generation using grid beam search

C Hokamp, Q Liu - arxiv preprint arxiv:1704.07138, 2017 - arxiv.org
We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the
inclusion of pre-specified lexical constraints. The algorithm can be used with any model that …

Understanding and detecting hallucinations in neural machine translation via model introspection

W Xu, S Agrawal, E Briakou, MJ Martindale… - Transactions of the …, 2023 - direct.mit.edu
Neural sequence generation models are known to “hallucinate”, by producing outputs that
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …

The united nations parallel corpus v1. 0

M Ziemski, M Junczys-Dowmunt… - Proceedings of the …, 2016 - aclanthology.org
This paper describes the creation process and statistics of the official United Nations Parallel
Corpus, the first parallel corpus composed from United Nations documents published by the …

Improving neural machine translation models with monolingual data

R Sennrich, B Haddow, A Birch - arxiv preprint arxiv:1511.06709, 2015 - arxiv.org
Neural Machine Translation (NMT) has obtained state-of-the art performance for several
language pairs, while only using parallel data for training. Target-side monolingual data …

[PDF][PDF] PPDB: The paraphrase database

J Ganitkevitch, B Van Durme… - Proceedings of the …, 2013 - aclanthology.org
We present the 1.0 release of our paraphrase database, PPDB. Its English portion, PPDB:
Eng, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …