Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

How to reuse and compose knowledge for a lifetime of tasks: A survey on continual learning and functional composition

JA Mendez, E Eaton - arxiv preprint arxiv:2207.07730, 2022 - arxiv.org
A major goal of artificial intelligence (AI) is to create an agent capable of acquiring a general
understanding of the world. Such an agent would require the ability to continually …

Dynamically transformed instance normalization network for generalizable person re-identification

B Jiao, L Liu, L Gao, G Lin, L Yang, S Zhang… - European conference on …, 2022 - Springer
Existing person re-identification methods often suffer significant performance degradation on
unseen domains, which fuels interest in domain generalizable person re-identification (DG …

Interpretability for reliable, efficient, and self-cognitive DNNs: From theories to applications

X Kang, J Guo, B Song, B Cai, H Sun, Z Zhang - Neurocomputing, 2023 - Elsevier
In recent years, remarkable achievements have been made in artificial intelligence tasks
and applications based on deep neural networks (DNNs), especially in the fields of vision …

CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models

AR Akula, K Wang, C Liu, S Saba-Sadiya, H Lu… - Iscience, 2022 - cell.com
We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new
explainable AI (XAI) framework for explaining decisions made by a deep convolutional …

Knowledge-augmented deep learning and its applications: A survey

Z Cui, T Gao, K Talamadupula… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning models, though having achieved great success in many different fields over
the past years, are usually data-hungry, fail to perform well on unseen samples, and lack …

Reconstructing action-conditioned human-object interactions using commonsense knowledge priors

X Wang, G Li, YL Kuo, M Kocabas… - … Conference on 3D …, 2022 - ieeexplore.ieee.org
We present a method for inferring diverse 3D models of human-object interactions from
images. Reasoning about how humans interact with objects in complex scenes from a single …

Eqa-mx: Embodied question answering using multimodal expression

MM Islam, A Gladstone, R Islam… - The Twelfth International …, 2023 - openreview.net
Humans predominantly use verbal utterances and nonverbal gestures (eg, eye gaze and
pointing gestures) in their natural interactions. For instance, pointing gestures and verbal …

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

C Li, Z Li, C **g, Y Wu, M Zhai, Y Jia - European Conference on Computer …, 2024 - Springer
Compositional generalization has received much attention in vision-and-language and
visual reasoning recently. Substitutivity, the capability to generalize to novel compositions …

Patron: perspective-aware multitask model for referring expression grounding using embodied multimodal cues

MM Islam, A Gladstone, T Iqbal - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Humans naturally use referring expressions with verbal utterances and nonverbal gestures
to refer to objects and events. As these referring expressions can be interpreted differently …