A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Meta learning for natural language processing: A survey

H Lee, SW Li, NT Vu - arxiv preprint arxiv:2205.01500, 2022 - arxiv.org
Deep learning has been the mainstream technique in natural language processing (NLP)
area. However, the techniques require many labeled data and are less generalizable across …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …

Speculative decoding with big little decoder

S Kim, K Mangalam, S Moon, J Malik… - Advances in …, 2024 - proceedings.neurips.cc
The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Spikingbert: Distilling bert to train spiking language models using implicit differentiation

M Bal, A Sengupta - Proceedings of the AAAI conference on artificial …, 2024 - ojs.aaai.org
Large language Models (LLMs), though growing exceedingly powerful, comprises of orders
of magnitude less neurons and synapses than the human brain. However, it requires …

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

Nas-bench-nlp: neural architecture search benchmark for natural language processing

N Klyuchnikov, I Trofimov, E Artemova… - IEEE …, 2022 - ieeexplore.ieee.org
Neural Architecture Search (NAS) is a promising and rapidly evolving research area.
Training a large number of neural networks requires an exceptional amount of …