A survey of techniques for optimizing transformer inference
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …
transformer neural networks. The family of transformer networks, including Bidirectional …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Meta learning for natural language processing: A survey
Deep learning has been the mainstream technique in natural language processing (NLP)
area. However, the techniques require many labeled data and are less generalizable across …
area. However, the techniques require many labeled data and are less generalizable across …
Squeezellm: Dense-and-sparse quantization
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …
wide range of tasks. However, deploying these models for inference has been a significant …
A fast post-training pruning framework for transformers
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …
However, prior work on pruning Transformers requires retraining the models. This can add …
Speculative decoding with big little decoder
The recent emergence of Large Language Models based on the Transformer architecture
has enabled dramatic advancements in the field of Natural Language Processing. However …
has enabled dramatic advancements in the field of Natural Language Processing. However …
Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Spikingbert: Distilling bert to train spiking language models using implicit differentiation
Large language Models (LLMs), though growing exceedingly powerful, comprises of orders
of magnitude less neurons and synapses than the human brain. However, it requires …
of magnitude less neurons and synapses than the human brain. However, it requires …
Neural architecture search for transformers: A survey
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …
due to their effectiveness in various applications across Natural Language Processing (NLP) …
Nas-bench-nlp: neural architecture search benchmark for natural language processing
Neural Architecture Search (NAS) is a promising and rapidly evolving research area.
Training a large number of neural networks requires an exceptional amount of …
Training a large number of neural networks requires an exceptional amount of …