- Academic Search

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Enregistrer Citer Cité 44 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Frozen in time: A joint video and image encoder for end-to-end retrieval

M Bain, A Nagrani, G Varol… - Proceedings of the …, 2021 - openaccess.thecvf.com

Our objective in this work is video-text retrieval-in particular a joint embedding that enables
efficient text-to-video retrieval. The challenges in this area include the design of the visual …

Enregistrer Citer Cité 1162 fois Autres articles Les 12 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Autoad ii: The sequel-who, when, and what in movie audio description

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …

Enregistrer Citer Cité 39 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Long-form video-language pre-training with multimodal temporal contrastive learning

Y Sun, H Xue, R Song, B Liu… - Advances in neural …, 2022 - proceedings.neurips.cc

Large-scale video-language pre-training has shown significant improvement in video-
language understanding tasks. Previous studies of video-language pretraining mainly focus …

Enregistrer Citer Cité 72 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

Enregistrer Citer Cité 58 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Towards long-form video understanding

CY Wu, P Krahenbuhl - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com

Our world offers a never-ending stream of visual stimuli, yet today's vision systems only
accurately recognize patterns within a few seconds. These systems understand the present …

Enregistrer Citer Cité 165 fois Autres articles Les 12 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Learning audio-video modalities from image captions

A Nagrani, PH Seo, B Seybold, A Hauth… - … on Computer Vision, 2022 - Springer

There has been a recent explosion of large-scale image-text datasets, as images with alt-
text captions can be easily obtained online. Obtaining large-scale, high quality data for video …

Enregistrer Citer Cité 97 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Long movie clip classification with state-space video models

MM Islam, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern video recognition models are designed to operate on short video clips (eg, 5–
10 s in length). Thus, it is challenging to apply such models to long movie understanding …

Enregistrer Citer Cité 98 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Computational media intelligence: Human-centered machine analysis of media

K Somandepalli, T Guha, VR Martinez… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Media is created by humans for humans to tell stories. There exists a natural and imminent
need for creating human-centered media analytics to illuminate the stories being told and to …

Enregistrer Citer Cité 41 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A clip-hitchhiker's guide to long video retrieval

M Bain, A Nagrani, G Varol, A Zisserman - arxiv preprint arxiv:2205.08508, 2022 - arxiv.org

Our goal in this paper is the adaptation of image-text models for long video retrieval. Recent
works have demonstrated state-of-the-art performance in video retrieval by adopting CLIP …

Enregistrer Citer Cité 70 fois Autres articles Les 2 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Condensed movies: Story based retrieval with contextual embeddings

Knowledge graphs meet multi-modal learning: A comprehensive survey

Frozen in time: A joint video and image encoder for end-to-end retrieval

Autoad ii: The sequel-who, when, and what in movie audio description

Long-form video-language pre-training with multimodal temporal contrastive learning

AutoAD: Movie description in context

Towards long-form video understanding

Learning audio-video modalities from image captions

Long movie clip classification with state-space video models

Computational media intelligence: Human-centered machine analysis of media

A clip-hitchhiker's guide to long video retrieval