A survey of controllable text generation using transformer-based pre-trained language models
Controllable Text Generation (CTG) is an emerging area in the field of natural language
generation (NLG). It is regarded as crucial for the development of advanced text generation …
generation (NLG). It is regarded as crucial for the development of advanced text generation …
[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
Simulating 500 million years of evolution with a language model
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …
the space of natural proteins. Here we show that language models trained at scale on …
Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Vision gnn: An image is worth graph of nodes
Network architecture plays a key role in the deep learning-based computer vision system.
The widely-used convolutional neural network and transformer treat the image as a grid or …
The widely-used convolutional neural network and transformer treat the image as a grid or …
Scaling up your kernels to 31x31: Revisiting large kernel design in cnns
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
Metaformer is actually what you need for vision
Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …
attention-based token mixer module contributes most to their competence. However, recent …
Tridet: Temporal action detection with relative boundary modeling
In this paper, we present a one-stage framework TriDet for temporal action detection.
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …
Existing methods often suffer from imprecise boundary predictions due to the ambiguous …
SpectralFormer: Rethinking hyperspectral image classification with transformers
Hyperspectral (HS) images are characterized by approximately contiguous spectral
information, enabling the fine identification of materials by capturing subtle spectral …
information, enabling the fine identification of materials by capturing subtle spectral …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …