A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Cuing without sharing: A federated cued speech recognition framework via mutual knowledge distillation

Y Zhang, L Liu, L Liu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Cued Speech (CS) is a visual coding tool to encode spoken languages at the phonetic level,
which combines lip-reading and hand gestures to effectively assist communication among …

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arxiv preprint arxiv:2308.08849, 2023 - arxiv.org
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …

Computation and parameter efficient multi-modal fusion transformer for cued speech recognition

L Liu, L Liu, H Li - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that
combines lip reading with several specific hand shapes to make the spoken language …

Re-synchronization using the hand preceding model for multi-modal fusion in automatic continuous cued speech recognition

L Liu, G Feng, D Beautemps… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Cued Speech (CS) is an augmented lip reading system complemented by hand coding, and
it is very helpful to the deaf people. Automatic CS recognition can help communications …

Cross-modal knowledge distillation method for automatic cued speech recognition

J Wang, Z Tang, X Li, M Yu, Q Fang, L Liu - arxiv preprint arxiv …, 2021 - arxiv.org
Cued Speech (CS) is a visual communication system for the deaf or hearing impaired
people. It combines lip movements with hand cues to obtain a complete phonetic repertoire …

Residual-guided personalized speech synthesis based on face image

J Wang, Z Wang, X Hu, X Li, Q Fang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Previous works derive personalized speech features by training the model on a large
dataset composed of his/her audio sounds. It was reported that face information has a strong …

Memory-augmented contrastive learning for talking head generation

J Wang, Y Zhao, H Fan, T Xu, Q Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Given one reference facial image and a piece of speech as input, talking head generation
aims to synthesize a realistic-looking talking head video. However, generating a lip …

Cross-modal mutual learning for cued speech recognition

L Liu, L Liu - ICASSP 2023-2023 IEEE International Conference …, 2023 - ieeexplore.ieee.org
Automatic Cued Speech Recognition (ACSR) provides an intelligent human-machine
interface for visual communications, where the Cued Speech (CS) system utilizes lip …

Multistream neural architectures for cued speech recognition using a pre-trained visual feature extractor and constrained ctc decoding

S Sankar, D Beautemps… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
This paper proposes a simple and effective approach for automatic recognition of Cued
Speech (CS), a visual communication tool that helps people with hearing impairment to …