Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M **ang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …

Beyond mahalanobis distance for textual ood detection

P Colombo, E Dadalto, G Staerman… - Advances in …, 2022 - proceedings.neurips.cc
As the number of AI systems keeps growing, it is fundamental to implement and develop
efficient control mechanisms to ensure the safe and proper functioning of machine learning …

Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition

Z Lian, B Liu, J Tao - IEEE Transactions on Affective Computing, 2022 - ieeexplore.ieee.org
Conversational emotion recognition is a crucial research topic in human-computer
interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting …

Multimodal sentiment analysis with two-phase multi-task learning

B Yang, L Wu, J Zhu, B Shao, X Lin… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Multimodal Sentiment Analysis (MSA) is a challenging research area that studies sentiment
expressed from multiple heterogeneous modalities. Given those pre-trained language …

Infolm: A new metric to evaluate summarization & data2text generation

PJA Colombo, C Clavel, P Piantanida - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Assessing the quality of natural language generation (NLG) systems through human
annotation is very expensive. Additionally, human annotation campaigns are time …

Learning disentangled textual representations via statistical measures of similarity

P Colombo, G Staerman, N Noiry… - arxiv preprint arxiv …, 2022 - arxiv.org
When working with textual data, a natural application of disentangled representations is fair
classification where the goal is to make predictions without being biased (or influenced) by …

Automatic text evaluation through the lens of Wasserstein barycenters

P Colombo, G Staerman, C Clavel… - arxiv preprint arxiv …, 2021 - arxiv.org
A new metric\texttt {BaryScore} to evaluate text generation based on deep contextualized
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …

What are the best systems? new perspectives on nlp benchmarking

P Colombo, N Noiry, E Irurozki… - Advances in neural …, 2022 - proceedings.neurips.cc
Abstract In Machine Learning, a benchmark refers to an ensemble of datasets associated
with one or multiple metrics together with a way to aggregate different systems …

AMOA: Global acoustic feature enhanced modal-order-aware network for multimodal sentiment analysis

Z Li, Y Zhou, W Zhang, Y Liu, C Yang… - Proceedings of the …, 2022 - aclanthology.org
In recent years, multimodal sentiment analysis (MSA) has attracted more and more interest,
which aims to predict the sentiment polarity expressed in a video. Existing methods typically …

Learning emotional prompt features with multiple views for visual emotion analysis

Q Xu, Y Wei, S Yuan, J Wu, L Wang, C Wu - Information Fusion, 2024 - Elsevier
Visual emotion analysis (VEA) aiming to detect the emotions behind images, has gained
increasing attention with the development of online social media. Recent studies in prompt …