A comprehensive survey on segment anything model for vision and beyond
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …
Cuing without sharing: A federated cued speech recognition framework via mutual knowledge distillation
Cued Speech (CS) is a visual coding tool to encode spoken languages at the phonetic level,
which combines lip-reading and hand gestures to effectively assist communication among …
which combines lip-reading and hand gestures to effectively assist communication among …
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …
movements, gestures, facial expressions, and postures. It is a form of communication that …
Computation and parameter efficient multi-modal fusion transformer for cued speech recognition
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that
combines lip reading with several specific hand shapes to make the spoken language …
combines lip reading with several specific hand shapes to make the spoken language …
Re-synchronization using the hand preceding model for multi-modal fusion in automatic continuous cued speech recognition
Cued Speech (CS) is an augmented lip reading system complemented by hand coding, and
it is very helpful to the deaf people. Automatic CS recognition can help communications …
it is very helpful to the deaf people. Automatic CS recognition can help communications …
Cross-modal knowledge distillation method for automatic cued speech recognition
J Wang, Z Tang, X Li, M Yu, Q Fang, L Liu - arxiv preprint arxiv …, 2021 - arxiv.org
Cued Speech (CS) is a visual communication system for the deaf or hearing impaired
people. It combines lip movements with hand cues to obtain a complete phonetic repertoire …
people. It combines lip movements with hand cues to obtain a complete phonetic repertoire …
Residual-guided personalized speech synthesis based on face image
J Wang, Z Wang, X Hu, X Li, Q Fang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Previous works derive personalized speech features by training the model on a large
dataset composed of his/her audio sounds. It was reported that face information has a strong …
dataset composed of his/her audio sounds. It was reported that face information has a strong …
Memory-augmented contrastive learning for talking head generation
J Wang, Y Zhao, H Fan, T Xu, Q Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Given one reference facial image and a piece of speech as input, talking head generation
aims to synthesize a realistic-looking talking head video. However, generating a lip …
aims to synthesize a realistic-looking talking head video. However, generating a lip …
Cross-modal mutual learning for cued speech recognition
Automatic Cued Speech Recognition (ACSR) provides an intelligent human-machine
interface for visual communications, where the Cued Speech (CS) system utilizes lip …
interface for visual communications, where the Cued Speech (CS) system utilizes lip …
Multistream neural architectures for cued speech recognition using a pre-trained visual feature extractor and constrained ctc decoding
This paper proposes a simple and effective approach for automatic recognition of Cued
Speech (CS), a visual communication tool that helps people with hearing impairment to …
Speech (CS), a visual communication tool that helps people with hearing impairment to …