SummaReranker: A multi-task mixture-of-experts re-ranking framework for abstractive summarization

M Ravaut, S Joty, NF Chen - arxiv preprint arxiv:2203.06569, 2022 - arxiv.org
Sequence-to-sequence neural networks have recently achieved great success in abstractive
summarization, especially through fine-tuning large pre-trained language models on the …

Learning to break the loop: Analyzing and mitigating repetitions for neural text generation

J Xu, X Liu, J Yan, D Cai, H Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
While large-scale neural language models, such as GPT2 and BART, have achieved
impressive results on various text generation tasks, they tend to get stuck in undesirable …

Repetition in repetition out: Towards understanding neural text degeneration from the data perspective

H Li, T Lan, Z Fu, D Cai, L Liu… - Advances in …, 2023 - proceedings.neurips.cc
There are a number of diverging hypotheses about the neural text degeneration problem, ie,
generating repetitive and dull loops, which makes this problem both interesting and …

Understanding in-context learning from repetitions

J Yan, J Xu, C Song, C Wu, Y Li, Y Zhang - arxiv preprint arxiv …, 2023 - arxiv.org
This paper explores the elusive mechanism underpinning in-context learning in Large
Language Models (LLMs). Our work provides a novel perspective by examining in-context …

Nearest neighbor knowledge distillation for neural machine translation

Z Yang, R Sun, X Wan - arxiv preprint arxiv:2205.00479, 2022 - arxiv.org
k-nearest-neighbor machine translation (NN-MT), proposed by Khandelwal et al.(2021), has
achieved many state-of-the-art results in machine translation tasks. Although effective, NN …

R2D2: Robust data-to-text with replacement detection

L Nan, LJY Flores, Y Zhao, Y Liu, L Benson… - arxiv preprint arxiv …, 2022 - arxiv.org
Unfaithful text generation is a common problem for text generation systems. In the case of
Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any …

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

ME Ildiz, Y Huang, Y Li, AS Rawat, S Oymak - arxiv preprint arxiv …, 2024 - arxiv.org
Modern language models rely on the transformer architecture and attention mechanism to
perform language understanding and text generation. In this work, we study learning a 1 …

[PDF][PDF] InferDPT: Privacy-preserving inference for black-box large language model

M Tong, K Chen, Y Qi, J Zhang… - arxiv preprint arxiv …, 2023 - mengtong0110.github.io
Large language models (LLMs), represented by ChatGPT, have greatly simplified text
generation tasks. However, they have also raised concerns about privacy risks such as data …

Exploring automatic text simplification of german narrative documents

T Schomacker, T Dönicke… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we apply transformer-based Natural Language Generation (NLG) techniques
to the problem of text simplification. Currently, there are only a few German datasets …

Decoupled non-parametric knowledge distillation for end-to-end speech translation

H Zhang, N Si, Y Chen, W Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Existing techniques often attempt to make knowledge transfer from a powerful machine
translation (MT) to speech translation (ST) model with some elaborate techniques, which …