Google Tudós

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Mentés Hivatkozás Idézetek száma: 199 Kapcsolódó cikkek Mind a(z) 7 változat Könyvtári keresés HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023 - dl.acm.org

Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

Mentés Hivatkozás Idézetek száma: 413 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Mentés Hivatkozás Idézetek száma: 150 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang… - ar** language-image pre-training for unified vision-language understanding and generation

J Li, D Li, C **ong, S Hoi - International conference on …, 2022 - proceedings.mlr.press

Abstract Vision-Language Pre-training (VLP) has advanced the performance for many vision-
language tasks. However, most existing pre-trained models only excel in either …

Mentés Hivatkozás Idézetek száma: 4248 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

mplug-2: A modularized multi-modal foundation model across text, image and video

H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu… - International …, 2023 - proceedings.mlr.press

Recent years have witnessed a big convergence of language, vision, and multi-modal
pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized …

Mentés Hivatkozás Idézetek száma: 129 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unmasked teacher: Towards training-efficient video foundation models

K Li, Y Wang, Y Li, Y Wang, Y He… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Video Foundation Models (VFMs) have received limited exploration due to high
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …

Mentés Hivatkozás Idézetek száma: 144 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Vision-language pre-training: Basics, recent advances, and future trends

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Videomamba: State space model for efficient video understanding

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

mplug-2: A modularized multi-modal foundation model across text, image and video

Unmasked teacher: Towards training-efficient video foundation models