- Academic Search

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Enregistrer Citer Cité 197 fois Autres articles Les 7 versions Free GPT-4 DeepSeek Recherche dans les bibliothèques Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org

The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Enregistrer Citer Cité 155 fois Autres articles Les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Enregistrer Citer Cité 632 fois Autres articles Les 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Flamingo: a visual language model for few-shot learning

JB Alayrac, J Donahue, P Luc… - Advances in neural …, 2022 - proceedings.neurips.cc

Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …

Enregistrer Citer Cité 3733 fois Autres articles Les 7 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Enregistrer Citer Cité 252 fois Autres articles Les 7 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videoclip: Contrastive pre-training for zero-shot video-text understanding

H Xu, G Ghosh, PY Huang, D Okhonko… - arxiv preprint arxiv …, 2021 - arxiv.org

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot
video and text understanding, without using any labels on downstream tasks. VideoCLIP …

Enregistrer Citer Cité 585 fois Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

S Chen, H Li, Q Wang, Z Zhao… - Advances in Neural …, 2024 - proceedings.neurips.cc

Vision and text have been fully explored in contemporary video-text foundational models,
while other modalities such as audio and subtitles in videos have not received sufficient …

Enregistrer Citer Cité 102 fois Autres articles Les 6 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Enregistrer Citer Cité 67 fois Autres articles Les 6 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bridging video-text retrieval with multiple choice questions

Y Ge, Y Ge, X Liu, D Li, Y Shan… - Proceedings of the …, 2022 - openaccess.thecvf.com

Pre-training a model to learn transferable video-text representation for retrieval has attracted
a lot of attention in recent years. Previous dominant works mainly adopt two separate …

Enregistrer Citer Cité 175 fois Autres articles Les 7 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Advancing high-resolution video-language representation with large-scale video transcriptions

H Xue, T Hang, Y Zeng, Y Sun, B Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

We study joint video and language (VL) pre-training to enable cross-modality learning and
benefit plentiful downstream VL tasks. Existing works either extract low-quality video …

Enregistrer Citer Cité 194 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Vlm: Task-agnostic video-language model pre-training for video understanding

Vision-language pre-training: Basics, recent advances, and future trends

Self-supervised learning for videos: A survey

Multimodal learning with transformers: A survey

Flamingo: a visual language model for few-shot learning

St-adapter: Parameter-efficient image-to-video transfer learning

Videoclip: Contrastive pre-training for zero-shot video-text understanding

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

Bridging video-text retrieval with multiple choice questions

Advancing high-resolution video-language representation with large-scale video transcriptions