- Academic Search

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

Tallenna Viittaa Viittausten määrä 818 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

Tallenna Viittaa Viittausten määrä 196 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

[Free GPT-4]
[DeepSeek]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Tallenna Viittaa Viittausten määrä 588 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Tallenna Viittaa Viittausten määrä 228 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota Kirjastohaku HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] nih.gov

Medclip: Contrastive learning from unpaired medical images and text

Z Wang, Z Wu, D Agarwal, J Sun - Proceedings of the …, 2022 - pmc.ncbi.nlm.nih.gov

Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the
paired image and caption embeddings while pushing others apart, which improves …

Tallenna Viittaa Viittausten määrä 408 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

Tallenna Viittaa Viittausten määrä 563 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

An image is worth 32 tokens for reconstruction and generation

Q Yu, M Weber, X Deng, X Shen… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent advancements in generative models have highlighted the crucial role of image
tokenization in the efficient synthesis of high-resolution images. Tokenization, which …

Tallenna Viittaa Viittausten määrä 52 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Tallenna Viittaa Viittausten määrä 640 Aiheeseen liittyviä artikkeleita Kaikki 11 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

Tallenna Viittaa Viittausten määrä 562 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - European conference on …, 2022 - Springer

Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

Tallenna Viittaa Viittausten määrä 519 Aiheeseen liittyviä artikkeleita Kaikki 11 versiota

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Pixel-bert: Aligning image pixels with text by deep multi-modal transformers

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

A comprehensive survey on applications of transformers for deep learning tasks

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Multimodal foundation models: From specialists to general-purpose assistants

Medclip: Contrastive learning from unpaired medical images and text

Image as a foreign language: Beit pretraining for vision and vision-language tasks

An image is worth 32 tokens for reconstruction and generation

Multimodal learning with transformers: A survey

Git: A generative image-to-text transformer for vision and language

Simple open-vocabulary object detection