Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A survey of automatic text summarization: Progress, process and challenges

MF Mridha, AA Lima, K Nur, SC Das, M Hasan… - IEEE …, 2021 - ieeexplore.ieee.org
With the evolution of the Internet and multimedia technology, the amount of text data has
increased exponentially. This text volume is a precious source of information and knowledge …

An intelligent video analysis method for abnormal event detection in intelligent transportation systems

S Wan, X Xu, T Wang, Z Gu - IEEE Transactions on Intelligent …, 2020 - ieeexplore.ieee.org
Intelligent transportation systems pervasively deploy thousands of video cameras. Analyzing
live video streams from these cameras is of significant importance to public safety. As …

Intelligent character recognition using fully convolutional neural networks

R Ptucha, FP Such, S Pillai, F Brockler, V Singh… - Pattern recognition, 2019 - Elsevier
The recognition of handwritten text is challenging as there are virtually infinite ways a human
can write the same message. Deep learning approaches for handwriting analysis have …

Multimodal abstractive summarization for how2 videos

S Palaskar, J Libovický, S Gella, F Metze - arxiv preprint arxiv:1906.07901, 2019 - arxiv.org
In this paper, we study abstractive summarization for open-domain videos. Unlike the
traditional text news summarization, the goal is less to" compress" text information but rather …

Move forward and tell: A progressive generator of video descriptions

Y **ong, B Dai, D Lin - Proceedings of the European …, 2018 - openaccess.thecvf.com
We present an efficient framework that can generate a coherent paragraph to describe a
given video. Previous works on video captioning usually focus on video clips. They typically …

A long video caption generation algorithm for big video data retrieval

S Ding, S Qu, Y **, S Wan - Future Generation Computer Systems, 2019 - Elsevier
Videos captured by people are often tied to certain important moments of their lives. But with
the era of big data coming, the time required to retrieval and watch can be daunting. In this …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Multinet++: Multi-stream feature aggregation and geometric loss strategy for multi-task learning

S Chennupati, G Sistu, S Yogamani… - Proceedings of the …, 2019 - openaccess.thecvf.com
Multi-task learning is commonly used in autonomous driving for solving various visual
perception tasks. It offers significant benefits in terms of both performance and computational …