Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
[HTML][HTML] A survey of multimodal information fusion for smart healthcare: Map** the journey from data to wisdom
Multimodal medical data fusion has emerged as a transformative approach in smart
healthcare, enabling a comprehensive understanding of patient health and personalized …
healthcare, enabling a comprehensive understanding of patient health and personalized …
Multimae: Multi-modal multi-task masked autoencoders
We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders
(MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can …
(MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can …
Disentangled representation learning
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying
and disentangling the underlying factors hidden in the observable data in representation …
and disentangling the underlying factors hidden in the observable data in representation …
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Meshtalk: 3d face animation from speech using cross-modality disentanglement
This paper presents a generic method for generating full facial 3D animation from speech.
Existing approaches to audio-driven facial animation exhibit uncanny or static upper face …
Existing approaches to audio-driven facial animation exhibit uncanny or static upper face …
Machine learning and deep learning applications in microbiome research
The many microbial communities around us form interactive and dynamic ecosystems called
microbiomes. Though concealed from the naked eye, microbiomes govern and influence …
microbiomes. Though concealed from the naked eye, microbiomes govern and influence …
Multimodal variational auto-encoder based audio-visual segmentation
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …
Multimodal conditional image synthesis with product-of-experts gans
Existing conditional image synthesis frameworks generate images based on user inputs in a
single modality, such as text, segmentation, or sketch. They do not allow users to …
single modality, such as text, segmentation, or sketch. They do not allow users to …
Contrastive machine learning reveals the structure of neuroanatomical variation within autism
Autism spectrum disorder (ASD) is highly heterogeneous. Identifying systematic individual
differences in neuroanatomy could inform diagnosis and personalized interventions. The …
differences in neuroanatomy could inform diagnosis and personalized interventions. The …