- Academic Search

A Yang, A Nagrani, I Laptev, J Sivic… - Advances in Neural …, 2024 - proceedings.neurips.cc

Segmenting untrimmed videos into chapters enables users to quickly navigate to the
information of their interest. This important topic has been understudied due to the lack of …

Salva Cita Citato da 28 Articoli correlati Tutte e 19 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Movienet: A holistic dataset for movie understanding

Q Huang, Y **ong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer

Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

Salva Cita Citato da 264 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] thecvf.com

Efficient movie scene detection using state-space transformers

MM Islam, M Hasan, KS Athrey… - Proceedings of the …, 2023 - openaccess.thecvf.com

The ability to distinguish between different movie scenes is critical for understanding the
storyline of a movie. However, accurately detecting movie scenes is often challenging as it …

Salva Cita Citato da 47 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] ieee.org

Computational media intelligence: Human-centered machine analysis of media

K Somandepalli, T Guha, VR Martinez… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Media is created by humans for humans to tell stories. There exists a natural and imminent
need for creating human-centered media analytics to illuminate the stories being told and to …

Salva Cita Citato da 41 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] arxiv.org

Cross-modal consensus network for weakly supervised temporal action localization

FT Hong, JC Feng, D Xu, Y Shan… - Proceedings of the 29th …, 2021 - dl.acm.org

Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to
localize action instances in the given video with video-level categorical supervision …

Salva Cita Citato da 93 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] arxiv.org

Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks

J Wang, K Chen, Q Dou - 2021 IEEE/RSJ International …, 2021 - ieeexplore.ieee.org

Category-level 6D pose estimation, aiming to predict the location and orientation of unseen
object instances, is fundamental to many scenarios such as robotic manipulation and …

Salva Cita Citato da 95 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]

[PDF] thecvf.com

Shot contrastive self-supervised learning for scene boundary detection

S Chen, X Nie, D Fan, D Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Scenes play a crucial role in breaking the storyline of movies and TV episodes into
semantically cohesive parts. However, given their complex temporal structure, finding scene …

Salva Cita Citato da 81 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Sep-stereo: Visually guided stereophonic audio generation by associating source separation

H Zhou, X Xu, D Lin, X Wang, Z Liu - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
Recent research has explored the usage of visual information as guidance to generate …

Salva Cita Citato da 95 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]

[PDF] aaai.org

Towards global video scene segmentation with context-aware transformer

Y Yang, Y Huang, W Guo, B Xu, D **a - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Videos such as movies or TV episodes usually need to divide the long storyline into
cohesive units, ie, scenes, to facilitate the understanding of video semantics. The key …

Salva Cita Citato da 28 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] thecvf.com

Condensed movies: Story based retrieval with contextual embeddings

M Bain, A Nagrani, A Brown… - Proceedings of the …, 2020 - openaccess.thecvf.com

Our objective in this work is the long range understandingof the narrative structure of
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …

Salva Cita Citato da 112 Articoli correlati Tutte e 10 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

A local-to-global approach to multi-modal movie scene segmentation

Vidchapters-7m: Video chapters at scale

Movienet: A holistic dataset for movie understanding

Efficient movie scene detection using state-space transformers

Computational media intelligence: Human-centered machine analysis of media

Cross-modal consensus network for weakly supervised temporal action localization

Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks

Shot contrastive self-supervised learning for scene boundary detection

Sep-stereo: Visually guided stereophonic audio generation by associating source separation

Towards global video scene segmentation with context-aware transformer

Condensed movies: Story based retrieval with contextual embeddings