Deja vu: Contextual sparsity for efficient llms at inference time
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …
wave of exciting AI applications. However, they are computationally expensive at inference …
Image super-resolution with non-local sparse attention
Both non-local (NL) operation and sparse representation are crucial for Single Image Super-
Resolution (SISR). In this paper, we investigate their combinations and propose a novel Non …
Resolution (SISR). In this paper, we investigate their combinations and propose a novel Non …
Survey of vector database management systems
There are now over 20 commercial vector database management systems (VDBMSs), all
produced within the past five years. But embedding-based retrieval has been studied for …
produced within the past five years. But embedding-based retrieval has been studied for …
Reformer: The efficient transformer
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but
training these models can be prohibitively costly, especially on long sequences. We …
training these models can be prohibitively costly, especially on long sequences. We …
The limitations of federated learning in sybil settings
Federated learning over distributed multi-party data is an emerging paradigm that iteratively
aggregates updates from a group of devices to train a globally shared model. Relying on a …
aggregates updates from a group of devices to train a globally shared model. Relying on a …
ETC: Encoding long and structured inputs in transformers
Transformer models have advanced the state of the art in many Natural Language
Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended …
Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended …
Accelerating large-scale inference with anisotropic vector quantization
Quantization based techniques are the current state-of-the-art for scaling maximum inner
product search to massive databases. Traditional approaches to quantization aim to …
product search to massive databases. Traditional approaches to quantization aim to …
Mitigating sybils in federated learning poisoning
Machine learning (ML) over distributed multi-party data is required for a variety of domains.
Existing approaches, such as federated learning, collect the outputs computed by a group of …
Existing approaches, such as federated learning, collect the outputs computed by a group of …
Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning
Deep neural networks (DNNs) enable innovative applications of machine learning like
image recognition, machine translation, or malware detection. However, deep learning is …
image recognition, machine translation, or malware detection. However, deep learning is …
Scatterbrain: Unifying sparse and low-rank attention
Recent advances in efficient Transformers have exploited either the sparsity or low-rank
properties of attention matrices to reduce the computational and memory bottlenecks of …
properties of attention matrices to reduce the computational and memory bottlenecks of …