Multimodal co-learning: Challenges, applications with datasets, recent advances and future directions

A Rahate, R Walambe, S Ramanna, K Kotecha - Information Fusion, 2022 - Elsevier
Multimodal deep learning systems that employ multiple modalities like text, image, audio,
video, etc., are showing better performance than individual modalities (ie, unimodal) …

Multimodal co-learning meets remote sensing: Taxonomy, state of the art, and future works

N Kieu, K Nguyen, A Nazib, T Fernando… - IEEE Journal of …, 2024 - ieeexplore.ieee.org
In remote sensing (RS), multiple modalities of data are usually available, eg, RGB,
multispectral, hyperspectral, light detection and ranging (LiDAR), and synthetic aperture …

Cultural adaptation of recipes

Y Cao, Y Kementchedjhieva, R Cui… - Transactions of the …, 2024 - direct.mit.edu
Building upon the considerable advances in Large Language Models (LLMs), we are now
equipped to address more sophisticated tasks demanding a nuanced understanding of …

ARGUS: Visualization of AI-Assisted Task Guidance in AR

S Castelo, J Rulff, E McGowan, B Steers… - … on Visualization and …, 2023 - ieeexplore.ieee.org
The concept of augmented reality (AR) assistants has captured the human imagination for
decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to …

Multimodality for NLP-centered applications: Resources, advances and frontiers

M Garg, S Wazarkar, M Singh… - Proceedings of the …, 2022 - aclanthology.org
With the development of multimodal systems and natural language generation techniques,
the resurgence of multimodal datasets has attracted significant research interests, which …

Aligning actions across recipe graphs

L Donatelli, T Schmidt, D Biswas, A Köhn… - Proceedings of the …, 2021 - aclanthology.org
Recipe texts are an idiosyncratic form of instructional language that pose unique challenges
for automatic understanding. One challenge is that a cooking step in one recipe can be …

Transferring knowledge from text to video: Zero-shot anticipation for procedural actions

F Sener, R Saraf, A Yao - IEEE transactions on pattern analysis …, 2022 - ieeexplore.ieee.org
Can we teach a robot to recognize and make predictions for activities that it has never seen
before? We tackle this problem by learning models for video from text. This paper presents a …

Benchmarking procedural language understanding for low-resource languages: A case study on Turkish

A Uzunoglu, GG Şahin - arxiv preprint arxiv:2309.06698, 2023 - arxiv.org
Understanding procedural natural language (eg, step-by-step instructions) is a crucial step
to execution and planning. However, while there are ample corpora and downstream tasks …

Modeling temporal-modal entity graph for procedural multimodal machine comprehension

H Zhang, Z Zhang, Y Zhang, J Wang, Y Li… - arxiv preprint arxiv …, 2022 - arxiv.org
Procedural Multimodal Documents (PMDs) organize textual instructions and corresponding
images step by step. Comprehending PMDs and inducing their representations for the …

Reasoning about procedures with natural language processing: A tutorial

L Zhang - arxiv preprint arxiv:2205.07455, 2022 - arxiv.org
This tutorial provides a comprehensive and in-depth view of the research on procedures,
primarily in Natural Language Processing. A procedure is a sequence of steps intended to …