Mobilebert: a compact task-agnostic bert for resource-limited devices

Z Sun, H Yu, X Song, R Liu, Y Yang, D Zhou - arxiv preprint arxiv …, 2020 - arxiv.org
Natural Language Processing (NLP) has recently achieved great success by using huge pre-
trained models with hundreds of millions of parameters. However, these models suffer from …

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Speculative decoding with big little decoder

S Kim, K Mangalam, S Moon, J Malik… - Advances in …, 2024 - proceedings.neurips.cc
The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …

A survey on non-autoregressive generation for neural machine translation and beyond

Y **ao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Glancing transformer for non-autoregressive neural machine translation

L Qian, H Zhou, Y Bao, M Wang, L Qiu… - arxiv preprint arxiv …, 2020 - arxiv.org
Recent work on non-autoregressive neural machine translation (NAT) aims at improving the
efficiency by parallel decoding without sacrificing the quality. However, existing NAT …

Step-unrolled denoising autoencoders for text generation

N Savinov, J Chung, M Binkowski, E Elsen… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper we propose a new generative model of text, Step-unrolled Denoising
Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising …

Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation

J Kasai, N Pappas, H Peng, J Cross… - arxiv preprint arxiv …, 2020 - arxiv.org
Much recent effort has been invested in non-autoregressive neural machine translation,
which appears to be an efficient alternative to state-of-the-art autoregressive machine …

Non-autoregressive machine translation with latent alignments

C Saharia, W Chan, S Saxena, M Norouzi - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine
translation that model latent alignments with dynamic programming. We revisit CTC for …

Order-agnostic cross entropy for non-autoregressive machine translation

C Du, Z Tu, J Jiang - International conference on machine …, 2021 - proceedings.mlr.press
We propose a new training objective named order-agnostic cross entropy (OaXE) for fully
non-autoregressive translation (NAT) models. OaXE improves the standard cross-entropy …