Google Akademik

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Kaydet Alıntı yap Alıntılanma sayısı: 34 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW **e, H **e, L Zhao, L Zhang… - Advances in neural …, 2024 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Kaydet Alıntı yap Alıntılanma sayısı: 64 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Detecting moments and highlights in videos via natural language queries

J Lei, TL Berg, M Bansal - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Detecting customized moments and highlights from videos given natural language (NL) user
queries is an important but under-studied topic. One of the challenges in pursuing this …

Kaydet Alıntı yap Alıntılanma sayısı: 256 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

Kaydet Alıntı yap Alıntılanma sayısı: 121 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org

We address the problem of retrieving a specific moment from an untrimmed video by a query
sentence. This is a challenging problem because a target moment may take place in …

Kaydet Alıntı yap Alıntılanma sayısı: 521 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vtimellm: Empower llm to grasp video moments

B Huang, X Wang, H Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models (LLMs) have shown remarkable text understanding capabilities
which have been extended as Video LLMs to handle video data for comprehending visual …

Kaydet Alıntı yap Alıntılanma sayısı: 82 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vatex: A large-scale, high-quality multilingual dataset for video-and-language research

X Wang, J Wu, J Chen, L Li… - Proceedings of the …, 2019 - openaccess.thecvf.com

We present a new large-scale multilingual video description dataset, VATEX, which contains
over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions …

Kaydet Alıntı yap Alıntılanma sayısı: 622 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Kaydet Alıntı yap Alıntılanma sayısı: 54 İlgili makaleler 8 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Tubedetr: Spatio-temporal video grounding with transformers

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2022 - openaccess.thecvf.com

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a
given text query. This is a challenging task that requires the joint and efficient modeling of …

Kaydet Alıntı yap Alıntılanma sayısı: 109 İlgili makaleler 10 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

X Wang, Q Huang, A Celikyilmaz… - Proceedings of the …, 2019 - openaccess.thecvf.com

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out
natural language instructions inside real 3D environments. In this paper, we study how to …

Kaydet Alıntı yap Alıntılanma sayısı: 619 İlgili makaleler 10 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment

A survey on video moment localization

Momentdiff: Generative video moment retrieval from random to real

Detecting moments and highlights in videos via natural language queries

Query-dependent video representation for moment retrieval and highlight detection

Learning 2d temporal adjacent networks for moment localization with natural language

Vtimellm: Empower llm to grasp video moments

Vatex: A large-scale, high-quality multilingual dataset for video-and-language research

Temporal sentence grounding in videos: A survey and future directions

Tubedetr: Spatio-temporal video grounding with transformers

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation