Google Académico

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Guardar Citar Citado por 197 Artículos relacionados Las 7 versiones Búsqueda de bibliotecas Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023 - dl.acm.org

Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

Guardar Citar Citado por 408 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] arxiv.org

Video-chatgpt: Towards detailed video understanding via large vision and language models

M Maaz, H Rasheed, S Khan, FS Khan - arxiv preprint arxiv:2306.05424, 2023 - arxiv.org

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to
interact with visual data. While there have been initial attempts for image-based …

Guardar Citar Citado por 573 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Internvideo: General video foundation models via generative and discriminative learning

Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao… - arxiv preprint arxiv …, 2022 - arxiv.org

The foundation models have recently shown excellent performance on a variety of
downstream tasks in computer vision. However, most existing vision foundation models …

Guardar Citar Citado por 328 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - … on Computer Vision, 2022 - Springer

Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Guardar Citar Citado por 347 Artículos relacionados Las 7 versiones

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Review of large vision models and visual prompt engineering

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier

Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

Guardar Citar Citado por 146 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] thecvf.com

Denseclip: Language-guided dense prediction with context-aware prompting

Y Rao, W Zhao, G Chen, Y Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recent progress has shown that large-scale pre-training using contrastive image-text pairs
can be a promising alternative for high-quality visual representation learning from natural …

Guardar Citar Citado por 619 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] springer.com

Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Guardar Citar Citado por 191 Artículos relacionados Las 8 versiones

[Free GPT-4]

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Guardar Citar Citado por 252 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Pointclip: Point cloud understanding by clip

R Zhang, Z Guo, W Zhang, K Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training
(CLIP) have shown inspirational performance on 2D visual recognition, which learns to …

Guardar Citar Citado por 465 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Actionclip: A new paradigm for video action recognition

Vision-language pre-training: Basics, recent advances, and future trends

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Video-chatgpt: Towards detailed video understanding via large vision and language models

Internvideo: General video foundation models via generative and discriminative learning

Expanding language-image pretrained models for general video recognition

[HTML][HTML] Review of large vision models and visual prompt engineering

Denseclip: Language-guided dense prediction with context-aware prompting

Large-scale multi-modal pre-trained models: A comprehensive survey

St-adapter: Parameter-efficient image-to-video transfer learning

Pointclip: Point cloud understanding by clip