Google Acadèmic

N Anantrasirichai, D Bull - Artificial intelligence review, 2022 - Springer

This paper reviews the current state of the art in artificial intelligence (AI) technologies and
applications in the context of the creative industries. A brief background of AI, and …

Desa Cita Citat per 665 Articles relacionats Totes les 10 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Desa Cita Citat per 46 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arxiv preprint arxiv:2201.02184, 2022 - arxiv.org

Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …

Desa Cita Citat per 331 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual speech recognition for multiple languages in the wild

P Ma, S Petridis, M Pantic - Nature Machine Intelligence, 2022 - nature.com

Visual speech recognition (VSR) aims to recognize the content of speech based on lip
movements, without relying on the audio stream. Advances in deep learning and the …

Desa Cita Citat per 149 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end audio-visual speech recognition with conformers

P Ma, S Petridis, M Pantic - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and
Convolution-augmented transformer (Conformer), that can be trained in an end-to-end …

Desa Cita Citat per 267 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep audio-visual speech recognition

T Afouras, JS Chung, A Senior… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

Desa Cita Citat per 964 Articles relacionats Totes les 15 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Group normalization

Y Wu, K He - Proceedings of the European conference on …, 2018 - openaccess.thecvf.com

Batch Normalization (BN) is a milestone technique in the development of deep learning,
enabling various networks to train. However, normalizing along the batch dimension …

Desa Cita Citat per 4668 Articles relacionats Totes les 16 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] stc.org

[LLIBRE][B] AI now report 2018

M Whittaker, K Crawford, R Dobbe, G Fried, E Kaziunas… - 2018 - stc.org

The AI Now Institute at New York University is an interdisciplinary research institute
dedicated to understanding the social implications of AI technologies. It is the first university …

Desa Cita Citat per 499 Articles relacionats Totes les 12 versions Free GPT-4 DeepSeek Cerca de biblioteques Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LRS3-TED: a large-scale dataset for visual speech recognition

T Afouras, JS Chung, A Zisserman - arxiv preprint arxiv:1809.00496, 2018 - arxiv.org

This paper introduces a new multi-modal dataset for visual and audio-visual speech
recognition. It includes face tracks from over 400 hours of TED and TEDx videos, along with …

Desa Cita Citat per 498 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lipreading using temporal convolutional networks

B Martinez, P Ma, S Petridis… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Lip-reading has attracted a lot of research attention lately thanks to advances in deep
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …

Desa Cita Citat per 311 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Large-scale visual speech recognition

Artificial intelligence in the creative industries: a review

Deep learning for visual speech analysis: A survey

Learning audio-visual speech representation by masked multimodal cluster prediction

Visual speech recognition for multiple languages in the wild

End-to-end audio-visual speech recognition with conformers

Deep audio-visual speech recognition

Group normalization

[LLIBRE][B] AI now report 2018

LRS3-TED: a large-scale dataset for visual speech recognition

Lipreading using temporal convolutional networks