[HTML][HTML] Pre-trained language models and their applications
Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …
Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
Qwen-vl: A frontier large vision-language model with versatile abilities
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
Uni-controlnet: All-in-one control to text-to-image diffusion models
Text-to-Image diffusion models have made tremendous progress over the past two years,
enabling the generation of highly realistic images based on open-domain text descriptions …
enabling the generation of highly realistic images based on open-domain text descriptions …
Towards open-world recommendation with knowledge augmentation from large language models
Recommender system plays a vital role in various online services. However, its insulated
nature of training and deploying separately within a specific closed domain limits its access …
nature of training and deploying separately within a specific closed domain limits its access …
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model
Pretrained general-purpose language models can achieve state-of-the-art accuracies in
various natural language processing domains by adapting to downstream tasks via zero …
various natural language processing domains by adapting to downstream tasks via zero …
Vector quantized diffusion model for text-to-image synthesis
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …
Filip: Fine-grained interactive language-image pre-training
Unsupervised large-scale vision-language pre-training has shown promising advances on
various downstream tasks. Existing methods often model the cross-modal interaction either …
various downstream tasks. Existing methods often model the cross-modal interaction either …
[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …