- Academic Search

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019‏ - dl.acm.org‏

Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …‏

שמור צטט צוטט על ידי 255 מאמרים בנושא זה כל 9 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Video description: A comprehensive survey of deep learning approaches‏

G Rafiq, M Rafiq, GS Choi - Artificial Intelligence Review, 2023‏ - Springer‏

Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …‏

שמור צטט צוטט על ידי 29 מאמרים בנושא זה כל 6 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning‏

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …‏

שמור צטט צוטט על ידי 243 מאמרים בנושא זה כל 19 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

End-to-end generative pretraining for multimodal video captioning‏

PH Seo, A Nagrani, A Arnab… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

Recent video and language pretraining frameworks lack the ability to generate sentences.
We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining …‏

שמור צטט צוטט על ידי 206 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Autoad: Movie description in context‏

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …‏

שמור צטט צוטט על ידי 61 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

End-to-end dense video captioning with parallel decoding‏

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021‏ - openaccess.thecvf.com‏

Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …‏

שמור צטט צוטט על ידי 221 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Autoad ii: The sequel-who, when, and what in movie audio description‏

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …‏

שמור צטט צוטט על ידי 40 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omnivid: A generative framework for universal video understanding‏

J Wang, D Chen, C Luo, B He, L Yuan… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

The core of video understanding tasks such as recognition captioning and tracking is to
automatically detect objects or actions in a video and analyze their temporal evolution …‏

שמור צטט צוטט על ידי 18 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Exploring visual relationship for image captioning‏

T Yao, Y Pan, Y Li, T Mei - Proceedings of the European …, 2018‏ - openaccess.thecvf.com‏

It is always well believed that modeling relationships between objects would be helpful for
representing and eventually describing an image. Nevertheless, there has not been …‏

שמור צטט צוטט על ידי 1080 מאמרים בנושא זה כל 11 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Autoad iii: The prequel-back to the pixels‏

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Abstract Generating Audio Description (AD) for movies is a challenging task that requires
fine-grained visual understanding and an awareness of the characters and their names …‏

שמור צטט צוטט על ידי 16 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Jointly localizing and describing events for dense video captioning

Video description: A survey of methods, datasets, and evaluation metrics‏

Video description: A comprehensive survey of deep learning approaches‏

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning‏

End-to-end generative pretraining for multimodal video captioning‏

Autoad: Movie description in context‏

End-to-end dense video captioning with parallel decoding‏

Autoad ii: The sequel-who, when, and what in movie audio description‏

Omnivid: A generative framework for universal video understanding‏

Exploring visual relationship for image captioning‏

Autoad iii: The prequel-back to the pixels‏