Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

[HTML][HTML] A survey of multimodal information fusion for smart healthcare: Map** the journey from data to wisdom

T Shaik, X Tao, L Li, H **e, JD Velásquez - Information Fusion, 2024 - Elsevier
Multimodal medical data fusion has emerged as a transformative approach in smart
healthcare, enabling a comprehensive understanding of patient health and personalized …

Multimae: Multi-modal multi-task masked autoencoders

R Bachmann, D Mizrahi, A Atanov, A Zamir - European Conference on …, 2022 - Springer
We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders
(MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can …

Disentangled representation learning

X Wang, H Chen, Z Wu, W Zhu - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying
and disentangling the underlying factors hidden in the observable data in representation …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Meshtalk: 3d face animation from speech using cross-modality disentanglement

A Richard, M Zollhöfer, Y Wen… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a generic method for generating full facial 3D animation from speech.
Existing approaches to audio-driven facial animation exhibit uncanny or static upper face …

Machine learning and deep learning applications in microbiome research

R Hernández Medina, S Kutuzova… - ISME …, 2022 - academic.oup.com
The many microbial communities around us form interactive and dynamic ecosystems called
microbiomes. Though concealed from the naked eye, microbiomes govern and influence …

Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M **ang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …

Multimodal conditional image synthesis with product-of-experts gans

X Huang, A Mallya, TC Wang, MY Liu - European conference on computer …, 2022 - Springer
Existing conditional image synthesis frameworks generate images based on user inputs in a
single modality, such as text, segmentation, or sketch. They do not allow users to …

Contrastive machine learning reveals the structure of neuroanatomical variation within autism

A Aglinskas, JK Hartshorne, S Anzellotti - Science, 2022 - science.org
Autism spectrum disorder (ASD) is highly heterogeneous. Identifying systematic individual
differences in neuroanatomy could inform diagnosis and personalized interventions. The …