Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

How2: a large-scale dataset for multimodal language understanding

R Sanabria, O Caglayan, S Palaskar, D Elliott… - arxiv preprint arxiv …, 2018 - arxiv.org
In this paper, we introduce How2, a multimodal collection of instructional videos with English
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …

Abstractive summarization: An overview of the state of the art

S Gupta, SK Gupta - Expert Systems with Applications, 2019 - Elsevier
Summarization, is to reduce the size of the document while preserving the meaning, is one
of the most researched areas among the Natural Language Processing (NLP) community …

A survey on multi-modal summarization

A Jangra, S Mukherjee, A Jatowt, S Saha… - ACM Computing …, 2023 - dl.acm.org
The new era of technology has brought us to the point where it is convenient for people to
share their opinions over an abundance of platforms. These platforms have a provision for …

Hierarchical cross-modality semantic correlation learning model for multimodal summarization

L Zhang, X Zhang, J Pan - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Multimodal summarization with multimodal output (MSMO) generates a summary with both
textual and visual content. Multimodal news report contains heterogeneous contents, which …

Bridging text visualization and mining: A task-driven survey

S Liu, X Wang, C Collins, W Dou… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Visual text analytics has recently emerged as one of the most prominent topics in both
academic research and the commercial world. To provide an overview of the relevant …

Abstractive text-image summarization using multi-modal attentional hierarchical RNN

J Chen, H Zhuge - Proceedings of the 2018 conference on …, 2018 - aclanthology.org
Rapid growth of multi-modal documents on the Internet makes multi-modal summarization
research necessary. Most previous research summarizes texts or images separately. Recent …

Generating audio-visual slideshows from text articles using word concreteness

M Leake, HV Shin, JO Kim, M Agrawala - … of the 2020 CHI Conference on …, 2020 - dl.acm.org
We present a system that automatically transforms text articles into audio-visual slideshows
by leveraging the notion of word concreteness, which measures how strongly a word or …

Exploiting pseudo image captions for multimodal summarization

C Jiang, R **e, W Ye, J Sun, S Zhang - arxiv preprint arxiv:2305.05496, 2023 - arxiv.org
Cross-modal contrastive learning in vision language pretraining (VLP) faces the challenge
of (partial) false negatives. In this paper, we study this problem from the perspective of …