Google Acadèmic

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org

The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Desa Cita Citat per 155 Articles relacionats Totes les 4 versions Free GPT-4

[Free GPT-4]

[PDF] google.com

A review on multimodal zero‐shot learning

W Cao, Y Wu, Y Sun, H Zhang, J Ren… - … : Data Mining and …, 2023 - Wiley Online Library

Multimodal learning provides a path to fully utilize all types of information related to the
modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a …

Desa Cita Citat per 31 Articles relacionats Totes les 3 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Learning video representations from large language models

Y Zhao, I Misra, P Krähenbühl… - Proceedings of the …, 2023 - openaccess.thecvf.com

We introduce LAVILA, a new approach to learning video-language representations by
leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be …

Desa Cita Citat per 177 Articles relacionats Totes les 7 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] neurips.cc

Egocentric video-language pretraining

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …

Desa Cita Citat per 179 Articles relacionats Totes les 8 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Desa Cita Citat per 67 Articles relacionats Totes les 6 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] thecvf.com

Bridging video-text retrieval with multiple choice questions

Y Ge, Y Ge, X Liu, D Li, Y Shan… - Proceedings of the …, 2022 - openaccess.thecvf.com

Pre-training a model to learn transferable video-text representation for retrieval has attracted
a lot of attention in recent years. Previous dominant works mainly adopt two separate …

Desa Cita Citat per 175 Articles relacionats Totes les 7 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] thecvf.com

Verbs in action: Improving verb understanding in video-language models

L Momeni, M Caron, A Nagrani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …

Desa Cita Citat per 72 Articles relacionats Totes les 6 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] arxiv.org

Multi-modal transformer for video retrieval

V Gabeur, C Sun, K Alahari, C Schmid - … 28, 2020, Proceedings, Part IV 16, 2020 - Springer

The task of retrieving video content relevant to natural language queries plays a critical role
in effectively handling internet-scale datasets. Most of the existing methods for this caption-to …

Desa Cita Citat per 743 Articles relacionats Totes les 13 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Self-supervised multimodal versatile networks

JB Alayrac, A Recasens, R Schneider… - Advances in neural …, 2020 - proceedings.neurips.cc

Videos are a rich source of multi-modal supervision. In this work, we learn representations
using self-supervision by leveraging three modalities naturally present in videos: visual …

Desa Cita Citat per 439 Articles relacionats Totes les 5 versions Free GPT-4 Versió HTML

[Free GPT-4]

[PDF] springer.com

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100

D Damen, H Doughty, GM Farinella, A Furnari… - International Journal of …, 2022 - Springer

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …

Desa Cita Citat per 549 Articles relacionats Totes les 13 versions Free GPT-4

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Fine-grained action retrieval through multiple parts-of-speech embeddings

Self-supervised learning for videos: A survey

A review on multimodal zero‐shot learning

Learning video representations from large language models

Egocentric video-language pretraining

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

Bridging video-text retrieval with multiple choice questions

Verbs in action: Improving verb understanding in video-language models

Multi-modal transformer for video retrieval

Self-supervised multimodal versatile networks

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100