Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Accelerating transformer inference for translation via parallel decoding

A Santilli, S Severino, E Postolache, V Maiorca… - arxiv preprint arxiv …, 2023 - arxiv.org
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT).
The community proposed specific network architectures and learning-based methods to …

Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation

J Kasai, N Pappas, H Peng, J Cross… - arxiv preprint arxiv …, 2020 - arxiv.org
Much recent effort has been invested in non-autoregressive neural machine translation,
which appears to be an efficient alternative to state-of-the-art autoregressive machine …

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arxiv preprint arxiv …, 2022 - arxiv.org
NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

Fully non-autoregressive neural machine translation: Tricks of the trade

J Gu, X Kong - arxiv preprint arxiv:2012.15833, 2020 - arxiv.org
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously
predict tokens with single forward of neural networks, which significantly reduces the …

MulDA: A multilingual data augmentation framework for low-resource cross-lingual NER

L Liu, B Ding, L Bing, S Joty, L Si… - Proceedings of the 59th …, 2021 - aclanthology.org
Abstract Named Entity Recognition (NER) for low-resource languages is a both practical and
challenging research problem. This paper addresses zero-shot transfer for cross-lingual …

Imitation attacks and defenses for black-box machine translation systems

E Wallace, M Stern, D Song - arxiv preprint arxiv:2004.15015, 2020 - arxiv.org
Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to
exploit model errors. One setting of particular interest is machine translation (MT), where …

Losing Heads in the Lottery: Pruning Transformer

M Behnke, K Heafield - The 2020 Conference on Empirical …, 2020 - research.ed.ac.uk
The attention mechanism is the crucial component of the transformer architecture. Recent
research shows that most attention heads are not confident in their decisions and can be …

When attention meets fast recurrence: Training language models with reduced compute

T Lei - arxiv preprint arxiv:2102.12459, 2021 - arxiv.org
Large language models have become increasingly difficult to train because of the growing
computation time and cost. In this work, we present SRU++, a highly-efficient architecture …

Finetuning pretrained transformers into rnns

J Kasai, H Peng, Y Zhang, D Yogatama… - arxiv preprint arxiv …, 2021 - arxiv.org
Transformers have outperformed recurrent neural networks (RNNs) in natural language
generation. But this comes with a significant computational cost, as the attention …