[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
A comprehensive survey on applications of transformers for deep learning tasks
Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …
mechanism to capture contextual relationships within sequential data. Unlike traditional …
The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
Hubert: Self-supervised speech representation learning by masked prediction of hidden units
Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Cvt: Introducing convolutions to vision transformers
We present in this paper a new architecture, named Convolutional vision Transformer (CvT),
that improves Vision Transformer (ViT) in performance and efficiency by introducing …
that improves Vision Transformer (ViT) in performance and efficiency by introducing …
Transformers learn in-context by gradient descent
J Von Oswald, E Niklasson… - International …, 2023 - proceedings.mlr.press
At present, the mechanisms of in-context learning in Transformers are not well understood
and remain mostly an intuition. In this paper, we suggest that training Transformers on auto …
and remain mostly an intuition. In this paper, we suggest that training Transformers on auto …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Do transformers really perform badly for graph representation?
The Transformer architecture has become a dominant choice in many domains, such as
natural language processing and computer vision. Yet, it has not achieved competitive …
natural language processing and computer vision. Yet, it has not achieved competitive …