- Academic Search

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Save Cite Cited by 304 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] mdpi.com

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

A Moufidi, D Rousseau, P Rasti - Sensors, 2023 - mdpi.com

Multimodal deep learning, in the context of biometrics, encounters significant challenges
due to the dependence on long speech utterances and RGB images, which are often …

Save Cite Cited by 6 Related articles All 8 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] ieee.org

A Comprehensive Review of Recent Advances in Deep Neural Networks for Lipreading with Sign Language Recognition

N Rathipriya, N Maheswari - IEEE Access, 2024 - ieeexplore.ieee.org

Lip reading is a form of “listening” to people that happens visually. It's also referred to as
“Speech reading.” This is done by observing the speaker's face and listening to the spoken …

Save Cite Related articles

[Free GPT-4]

[PDF] arxiv.org

Multimodal integration for large-vocabulary audio-visual speech recognition

W Yu, S Zeiler, D Kolossa - 2020 28th European Signal …, 2021 - ieeexplore.ieee.org

For many small-and medium-vocabulary tasks, audio-visual speech recognition can
significantly improve the recognition rates compared to audio-only systems. However, there …

Save Cite Cited by 16 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Audiovisual speaker tracking using nonlinear dynamical systems with dynamic stream weights

C Schymura, D Kolossa - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org

Data fusion plays an important role in many technical applications that require efficient
processing of multimodal sensory observations. A prominent example is audiovisual signal …

Save Cite Cited by 10 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] github.io

A dynamic stream weight backprop Kalman filter for audiovisual speaker tracking

C Schymura, T Ochiai, M Delcroix… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Audiovisual speaker tracking is an application that has been tackled by a wide range of
classical approaches based on Gaussian filters, most notably the well-known Kalman filter …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] github.io

Extending linear dynamical systems with dynamic stream weights for audiovisual speaker localization

C Schymura, T Isenberg… - 2018 16th International …, 2018 - ieeexplore.ieee.org

An important aspect of audiovisual speaker localization is the appropriate fusion of acoustic
and visual observations based on their time-varying reliability. In this study, a framework …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] github.io

Learning dynamic stream weights for linear dynamical systems using natural evolution strategies

C Schymura, D Kolossa - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Multimodal data fusion is an important aspect of many object localization and tracking
frameworks that rely on sensory observations from different sources. A prominent example is …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Data fusion for audiovisual speaker localization: Extending dynamic stream weights to the spatial domain

J Wissing, B Boenninghoff, D Kolossa… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Estimating the positions of multiple speakers can be helpful for tasks like automatic speech
recognition or speaker diarization. Both applications benefit from a known speaker position …

Save Cite Cited by 3 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] hal.science

Machine Learning-Based Multimodal integration for Short Utterance-Based Biometrics Identification and Engagement Detection

A Moufidi - 2024 - theses.hal.science

The rapid advancement and democratization of technology have led to an abundance of
sensors. Consequently, the integration of these diverse modalities presents an advantage …

Create alert

Cite

Advanced search

Saved to My library

Environmentally robust audio-visual speaker identification

An overview of deep-learning-based audio-visual speech enhancement and separation

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

A Comprehensive Review of Recent Advances in Deep Neural Networks for Lipreading with Sign Language Recognition

Multimodal integration for large-vocabulary audio-visual speech recognition

Audiovisual speaker tracking using nonlinear dynamical systems with dynamic stream weights

A dynamic stream weight backprop Kalman filter for audiovisual speaker tracking

Extending linear dynamical systems with dynamic stream weights for audiovisual speaker localization

Learning dynamic stream weights for linear dynamical systems using natural evolution strategies

Data fusion for audiovisual speaker localization: Extending dynamic stream weights to the spatial domain

Machine Learning-Based Multimodal integration for Short Utterance-Based Biometrics Identification and Engagement Detection