- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Simpan Kutip Dirujuk 77 kali Artikel terkait

[Free GPT-4]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Simpan Kutip Dirujuk 151 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Autoad ii: The sequel-who, when, and what in movie audio description

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …

Simpan Kutip Dirujuk 39 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Coot: Cooperative hierarchical transformer for video-text representation learning

S Ging, M Zolfaghari, H Pirsiavash… - Advances in neural …, 2020 - proceedings.neurips.cc

Many real-world video-text tasks involve different levels of granularity, such as frames and
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …

Simpan Kutip Dirujuk 205 kali Artikel terkait 12 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Simpan Kutip Dirujuk 3883 kali Artikel terkait 12 versi

[Free GPT-4]

[PDF] thecvf.com

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

Simpan Kutip Dirujuk 58 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Movieqa: Understanding stories in movies through question-answering

M Tapaswi, Y Zhu, R Stiefelhagen… - Proceedings of the …, 2016 - openaccess.thecvf.com

We introduce the MovieQA dataset which aims to evaluate automatic story comprehension
from both video and text. The dataset consists of 14,944 questions about 408 movies with …

Simpan Kutip Dirujuk 851 kali Artikel terkait 13 versi Versi HTML

[Free GPT-4]

[PDF] cv-foundation.org

Aligning books and movies: Towards story-like visual explanations by watching movies and reading books

Y Zhu, R Kiros, R Zemel, R Salakhutdinov… - Proceedings of the …, 2015 - cv-foundation.org

Books are a rich source of both fine-grained information, how a character, an object or a
scene looks like, as well as high-level semantics, what someone is thinking, feeling and how …

Simpan Kutip Dirujuk 3404 kali Artikel terkait 18 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Semantic conditioned dynamic modulation for temporal sentence grounding in videos

Y Yuan, L Ma, J Wang, W Liu… - Advances in Neural …, 2019 - proceedings.neurips.cc

Temporal sentence grounding in videos aims to detect and localize one target video
segment, which semantically corresponds to a given sentence. Existing methods mainly …

Simpan Kutip Dirujuk 276 kali Artikel terkait 12 versi Versi HTML

[Free GPT-4]

[PDF] aaai.org

To find where you talk: Temporal sentence localization in video with attention based location regression

Y Yuan, T Mei, W Zhu - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org

We have witnessed the tremendous growth of videos over the Internet, where most of these
videos are typically paired with abundant sentence descriptions, such as video titles …

Simpan Kutip Dirujuk 364 kali Artikel terkait 8 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Book2movie: Aligning video scenes with book chapters

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Autoad ii: The sequel-who, when, and what in movie audio description

Coot: Cooperative hierarchical transformer for video-text representation learning

Multimodal machine learning: A survey and taxonomy

AutoAD: Movie description in context

Movieqa: Understanding stories in movies through question-answering

Aligning books and movies: Towards story-like visual explanations by watching movies and reading books

Semantic conditioned dynamic modulation for temporal sentence grounding in videos

To find where you talk: Temporal sentence localization in video with attention based location regression