Artificial intelligence in the creative industries: a review
This paper reviews the current state of the art in artificial intelligence (AI) technologies and
applications in the context of the creative industries. A brief background of AI, and …
applications in the context of the creative industries. A brief background of AI, and …
Deep learning for visual speech analysis: A survey
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …
due to its wide applications, such as public security, medical treatment, military defense, and …
Learning audio-visual speech representation by masked multimodal cluster prediction
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …
strong signal for speech representation learning from the speaker's lip movements and the …
Visual speech recognition for multiple languages in the wild
Visual speech recognition (VSR) aims to recognize the content of speech based on lip
movements, without relying on the audio stream. Advances in deep learning and the …
movements, without relying on the audio stream. Advances in deep learning and the …
End-to-end audio-visual speech recognition with conformers
In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and
Convolution-augmented transformer (Conformer), that can be trained in an end-to-end …
Convolution-augmented transformer (Conformer), that can be trained in an end-to-end …
Deep audio-visual speech recognition
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …
with or without the audio. Unlike previous works that have focussed on recognising a limited …
Group normalization
Batch Normalization (BN) is a milestone technique in the development of deep learning,
enabling various networks to train. However, normalizing along the batch dimension …
enabling various networks to train. However, normalizing along the batch dimension …
[LLIBRE][B] AI now report 2018
The AI Now Institute at New York University is an interdisciplinary research institute
dedicated to understanding the social implications of AI technologies. It is the first university …
dedicated to understanding the social implications of AI technologies. It is the first university …
LRS3-TED: a large-scale dataset for visual speech recognition
This paper introduces a new multi-modal dataset for visual and audio-visual speech
recognition. It includes face tracks from over 400 hours of TED and TEDx videos, along with …
recognition. It includes face tracks from over 400 hours of TED and TEDx videos, along with …
Lipreading using temporal convolutional networks
Lip-reading has attracted a lot of research attention lately thanks to advances in deep
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …