- Academic Search

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

保存引用被引用次数：825 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

保存引用被引用次数：184 相关文章所有 4 个版本

[Free GPT-4]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

保存引用被引用次数：452 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

保存引用被引用次数：569 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

保存引用被引用次数：629 相关文章所有 9 个版本

[Free GPT-4]

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

保存引用被引用次数：572 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

保存引用被引用次数：214 相关文章所有 6 个版本图书馆搜索 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - … on Computer Vision, 2022 - Springer

Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

保存引用被引用次数：494 相关文章所有 10 个版本

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

保存引用被引用次数：197 相关文章所有 7 个版本图书馆搜索 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Medclip: Contrastive learning from unpaired medical images and text

Z Wang, Z Wu, D Agarwal, J Sun - arxiv preprint arxiv:2210.10163, 2022 - arxiv.org

Existing vision-text contrastive learning like CLIP aims to match the paired image and
caption embeddings while pushing others apart, which improves representation …

保存引用被引用次数：401 相关文章所有 4 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Pixel-bert: Aligning image pixels with text by deep multi-modal transformers

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

A comprehensive survey on applications of transformers for deep learning tasks

Image as a foreign language: Beit pretraining for vision and vision-language tasks

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Multimodal learning with transformers: A survey

Git: A generative image-to-text transformer for vision and language

Multimodal foundation models: From specialists to general-purpose assistants

Simple open-vocabulary object detection

Vision-language pre-training: Basics, recent advances, and future trends

Medclip: Contrastive learning from unpaired medical images and text