- Academic Search

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Enregistrer Citer Cité 32 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Egocentric video-language pretraining

KQ Lin, J Wang, M Soldan, M Wray… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Video-Language Pretraining (VLP), which aims to learn transferable representation
to advance a wide range of video-text downstream tasks, has recently received increasing …

Enregistrer Citer Cité 177 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Revisiting the" video" in video-language understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Enregistrer Citer Cité 173 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Enregistrer Citer Cité 64 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Enregistrer Citer Cité 53 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Support-set bottlenecks for video-text representation learning

M Patrick, PY Huang, Y Asano, F Metze… - arxiv preprint arxiv …, 2020 - arxiv.org

The dominant paradigm for learning video-text representations--noise contrastive learning--
increases the similarity of the representations of pairs of samples that are known to be …

Enregistrer Citer Cité 294 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aaai.org

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org

We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

Enregistrer Citer Cité 519 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Unloc: A unified framework for video localization tasks

S Yan, X **ong, A Nagrani, A Arnab… - Proceedings of the …, 2023 - openaccess.thecvf.com

While large-scale image-text pretrained models such as CLIP have been used for multiple
video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos …

Enregistrer Citer Cité 50 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Tubedetr: Spatio-temporal video grounding with transformers

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2022 - openaccess.thecvf.com

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …

Enregistrer Citer Cité 108 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Vidchapters-7m: Video chapters at scale

A Yang, A Nagrani, I Laptev, J Sivic… - Advances in Neural …, 2024 - proceedings.neurips.cc

Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …

Enregistrer Citer Cité 28 fois Autres articles Les 19 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Localizing moments in video with temporal language

A survey on video moment localization

Egocentric video-language pretraining

Revisiting the" video" in video-language understanding

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

Temporal sentence grounding in videos: A survey and future directions

Support-set bottlenecks for video-text representation learning

Learning 2d temporal adjacent networks for moment localization with natural language

Unloc: A unified framework for video localization tasks

Tubedetr: Spatio-temporal video grounding with transformers

Vidchapters-7m: Video chapters at scale