[HTML][HTML] Data augmentation for audio-visual emotion recognition with an efficient multimodal conditional GAN

F Ma, Y Li, S Ni, SL Huang, L Zhang - Applied Sciences, 2022 - mdpi.com
Audio-visual emotion recognition is the research of identifying human emotional states by
combining the audio modality and the visual modality simultaneously, which plays an …

On universal features for high-dimensional learning and inference

SL Huang, A Makur, GW Wornell, L Zheng - arxiv preprint arxiv …, 2019 - arxiv.org
We consider the problem of identifying universal low-dimensional features from high-
dimensional data for inference tasks in settings involving learning. For such problems, we …

Learning better representations for audio-visual emotion recognition with common information

F Ma, W Zhang, Y Li, SL Huang, L Zhang - Applied Sciences, 2020 - mdpi.com
Audio-visual emotion recognition aims to distinguish human emotional states by integrating
the audio and visual data acquired in the expression of emotions. It is crucial for facilitating …

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

H Zhang, SL Huang, EE Kuruoglu - Remote Sensing, 2024 - mdpi.com
This paper investigates remote sensing data recognition and classification with multimodal
data fusion. Aiming at the problems of low recognition and classification accuracy and the …

Robust cross-modal remote sensing image retrieval via maximal correlation augmentation

Z Wang, X Wang, G Li, C Li - IEEE Transactions on Geoscience …, 2024 - ieeexplore.ieee.org
Most of the existing studies regarding cross-modal content-based remote sensing image
retrieval (CM-CBRSIR) focus on reducing/enlarging the Euclidean distances of cross-modal …

[PDF][PDF] A method of audio-visual person verification by mining connections between time series

P Sun, S Zhang, Z Liu, Y Yuan, T Zhang… - Proc …, 2023 - isca-archive.org
It has already been observed that audio-visual embedding is more robust than uni-modality
embedding for person verification. But the relationship of keyframes in time series between …

Learning Audio-Visual embedding for Person Verification in the Wild

P Sun, S Zhang, Z Liu, Y Yuan, T Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
It has already been observed that audio-visual embedding is more robust than uni-modality
embedding for person verification. Here, we proposed a novel audio-visual strategy that …

Generalized product-of-experts for learning multimodal representations in noisy environments

A Joshi, N Gupta, J Shah, B Bhattarai, A Modi… - Proceedings of the …, 2022 - dl.acm.org
A real-world application or setting involves interaction between different modalities (eg,
video, speech, text). In order to process the multimodal information automatically and use it …

More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory

P Sun, Y Zhang, Z Liu, D Chen, H Zhang - arxiv preprint arxiv:2312.07212, 2023 - arxiv.org
The vanilla fusion methods still dominate a large percentage of mainstream audio-visual
tasks. However, the effectiveness of vanilla fusion from a theoretical perspective is still worth …

A semi-supervised learning approach for visual question answering based on maximal correlation

S Yin, F Ma, SL Huang - 2021 IEEE International Conference …, 2021 - ieeexplore.ieee.org
In this paper, we propose a semi-supervised learning approach for the Visual Question
Answering (VQA) task based on maximal correlation. Instead of training the VQA model with …