Študovňa Google

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Uložiť Citovať Citované 198-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Uložiť Citovať Citované 199-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2023 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

Uložiť Citovať Citované 5487-krát Súvisiace články Všetky verzie 18 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Diffusion-based generation, optimization, and planning in 3d scenes

S Huang, Z Wang, P Li, B Jia, T Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser provides a unified model for solving scene-conditioned generation …

Uložiť Citovať Citované 184-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Uložiť Citovať Citované 648-krát Súvisiace články Všetky verzie 11

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Uložiť Citovať Citované 133-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arxiv preprint arxiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

Uložiť Citovať Citované 171-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How much can clip benefit vision-and-language tasks?

S Shen, LH Li, H Tan, M Bansal, A Rohrbach… - arxiv preprint arxiv …, 2021 - arxiv.org

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …

Uložiť Citovať Citované 460-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Uložiť Citovať Citované 2975-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Vlp: A survey on vision-language pre-training

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer

In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Uložiť Citovať Citované 220-krát Súvisiace články Všetky verzie 8

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Towards learning a generic agent for vision-and-language navigation via pre-training

Vision-language pre-training: Basics, recent advances, and future trends

Large-scale multi-modal pre-trained models: A comprehensive survey

Visual instruction tuning

Diffusion-based generation, optimization, and planning in 3d scenes

Multimodal learning with transformers: A survey

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

Foundation models for decision making: Problems, methods, and opportunities

How much can clip benefit vision-and-language tasks?

Transformers in vision: A survey

Vlp: A survey on vision-language pre-training