Multimodal intelligence: Representation learning, information fusion, and applications
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …
natural language processing since 2010. Each of these tasks involves a single modality in …
Balanced multimodal learning via on-the-fly gradient modulation
Audio-visual learning helps to comprehensively understand the world, by integrating
different senses. Accordingly, multiple input modalities are expected to boost model …
different senses. Accordingly, multiple input modalities are expected to boost model …
CTNet: Conversational transformer network for emotion recognition
Emotion recognition in conversation is a crucial topic for its widespread applications in the
field of human-computer interactions. Unlike vanilla emotion recognition of individual …
field of human-computer interactions. Unlike vanilla emotion recognition of individual …
[PDF][PDF] Speech emotion recognition with multi-task learning.
Speech emotion recognition (SER) classifies speech into emotion categories such as:
Happy, Angry, Sad and Neutral. Recently, deep learning has been applied to the SER task …
Happy, Angry, Sad and Neutral. Recently, deep learning has been applied to the SER task …
Overview of speaker modeling and its applications: From the lens of deep speaker representation learning
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …
By thoroughly and accurately modeling this information, it can be utilized in various …
The VoicePrivacy 2024 Challenge Evaluation Plan
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …
conceals the speaker's voice identity while protecting linguistic content and emotional states …
[PDF][PDF] Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity.
In this study, we analyze the use of state-of-the-art technologies for speaker recognition and
natural language processing to detect Alzheimer's Disease (AD) and to assess its severity …
natural language processing to detect Alzheimer's Disease (AD) and to assess its severity …
Emotion recognition by fusing time synchronous and time asynchronous representations
In this paper, a novel two-branch neural network model structure is proposed for multimodal
emotion recognition, which consists of a time synchronous branch (TSB) and a time …
emotion recognition, which consists of a time synchronous branch (TSB) and a time …
[PDF][PDF] Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.
In this study, we analyze the use of speech and speaker recognition technologies and
natural language processing to detect Alzheimer disease (AD) and estimate mini-mental …
natural language processing to detect Alzheimer disease (AD) and estimate mini-mental …
Multimodal emotion recognition using transfer learning from speaker recognition and bert-based models
Automatic emotion recognition plays a key role in computer-human interaction as it has the
potential to enrich the next-generation artificial intelligence with emotional intelligence. It …
potential to enrich the next-generation artificial intelligence with emotional intelligence. It …