- Academic Search

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Uložit Citovat Počet citací tohoto článku: 404 Související články Všechny verze (počet: 10)

[Free GPT-4]

[PDF] arxiv.org

Multimodal machine learning: A survey and taxonomy

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Uložit Citovat Počet citací tohoto článku: 3891 Související články Všechny verze (počet: 12)

[Free GPT-4]

[PDF] arxiv.org

Trusted multi-view classification with dynamic evidential fusion

Z Han, C Zhang, H Fu, JT Zhou - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org

Existing multi-view classification algorithms focus on promoting accuracy by exploiting
different views, typically integrating them into common representations for follow-up tasks …

Uložit Citovat Počet citací tohoto článku: 395 Související články Všechny verze (počet: 9)

[Free GPT-4]

[PDF] arxiv.org

Visual speech recognition for multiple languages in the wild

P Ma, S Petridis, M Pantic - Nature Machine Intelligence, 2022 - nature.com

Visual speech recognition (VSR) aims to recognize the content of speech based on lip
movements, without relying on the audio stream. Advances in deep learning and the …

Uložit Citovat Počet citací tohoto článku: 148 Související články Všechny verze (počet: 7)

[Free GPT-4]

[PDF] arxiv.org

End-to-end audio-visual speech recognition with conformers

P Ma, S Petridis, M Pantic - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and
Convolution-augmented transformer (Conformer), that can be trained in an end-to-end …

Uložit Citovat Počet citací tohoto článku: 266 Související články Všechny verze (počet: 4)

[Free GPT-4]

[PDF] neurips.cc

Mavil: Masked audio-video learners

PY Huang, V Sharma, H Xu, C Ryali… - Advances in …, 2024 - proceedings.neurips.cc

Abstract We present Masked Audio-Video Learners (MAViL) to learn audio-visual
representations with three complementary forms of self-supervision:(1) reconstructing …

Uložit Citovat Počet citací tohoto článku: 67 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Lipreading using temporal convolutional networks

B Martinez, P Ma, S Petridis… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Lip-reading has attracted a lot of research attention lately thanks to advances in deep
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …

Uložit Citovat Počet citací tohoto článku: 311 Související články Všechny verze (počet: 3)

[Free GPT-4]

[PDF] thecvf.com

Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Uložit Citovat Počet citací tohoto článku: 109 Související články Všechny verze (počet: 12) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Audiovisual slowfast networks for video recognition

F **ao, YJ Lee, K Grauman, J Malik… - arxiv preprint arxiv …, 2020 - arxiv.org

We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual
perception. AVSlowFast has Slow and Fast visual pathways that are deeply integrated with a …

Uložit Citovat Počet citací tohoto článku: 260 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] mdpi.com

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …

Uložit Citovat Počet citací tohoto článku: 75 Související články Všechny verze (počet: 9) Archiv

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Recent advances in the automatic recognition of audiovisual speech

Self-supervised speech representation learning: A review

Multimodal machine learning: A survey and taxonomy

Trusted multi-view classification with dynamic evidential fusion

Visual speech recognition for multiple languages in the wild

End-to-end audio-visual speech recognition with conformers

Mavil: Masked audio-video learners

Lipreading using temporal convolutional networks

Sub-word level lip reading with visual attention

Audiovisual slowfast networks for video recognition

Audio-visual speech and gesture recognition by sensors of mobile devices