Google Academic

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Salvați Citați Citat de 55 ori Articole cu conținut similar Toate cele 9 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vtimellm: Empower llm to grasp video moments

B Huang, X Wang, H Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models (LLMs) have shown remarkable text understanding capabilities
which have been extended as Video LLMs to handle video data for comprehending visual …

Salvați Citați Citat de 86 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Salvați Citați Citat de 20 ori Articole cu conținut similar Toate cele 7 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ai choreographer: Music conditioned 3d dance generation with aist++

R Li, S Yang, DA Ross… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …

Salvați Citați Citat de 515 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Graph convolutional networks for temporal action localization

R Zeng, W Huang, M Tan, Y Rong… - Proceedings of the …, 2019 - openaccess.thecvf.com

Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …

Salvați Citați Citat de 625 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning

M Zheng, Y Huang, Q Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …

Salvați Citați Citat de 100 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

Salvați Citați Citat de 45 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] nih.gov

Video captioning using global-local representation

L Yan, S Ma, Q Wang, Y Chen, X Zhang… - … on Circuits and …, 2022 - ieeexplore.ieee.org

Video captioning is a challenging task as it needs to accurately transform visual
understanding into natural language description. To date, state-of-the-art methods …

Salvați Citați Citat de 97 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

To find where you talk: Temporal sentence localization in video with attention based location regression

Y Yuan, T Mei, W Zhu - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org

We have witnessed the tremendous growth of videos over the Internet, where most of these
videos are typically paired with abundant sentence descriptions, such as video titles …

Salvați Citați Citat de 362 ori Articole cu conținut similar Toate cele 8 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-modal dense video captioning

V Iashin, E Rahtu - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

Dense video captioning is a task of localizing interesting events from an untrimmed video
and producing textual description (captions) for each localized event. Most of the previous …

Salvați Citați Citat de 216 ori Articole cu conținut similar Toate cele 10 versiuni Afișare ca HTML

Citați

Căutare avansată

Salvat în Bibliotecă

Temporal sentence grounding in videos: A survey and future directions

Vtimellm: Empower llm to grasp video moments

A review of deep learning for video captioning

Ai choreographer: Music conditioned 3d dance generation with aist++

Graph convolutional networks for temporal action localization

Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

Video captioning using global-local representation

To find where you talk: Temporal sentence localization in video with attention based location regression

Multi-modal dense video captioning