- Academic Search

X Wang, Z Yan, R Zhang, P Zhang - Journal of Network and Computer …, 2021 - Elsevier

User authentication systems (in short authentication systems) have wide utilization in our
daily life. Unfortunately, existing authentication systems are prone to various attacks while …

Salva Cita Citato da 85 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]

[PDF] researchgate.net

Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier

In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

Salva Cita Citato da 161 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] thecvf.com

Balanced multimodal learning via on-the-fly gradient modulation

X Peng, Y Wei, A Deng, D Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Audio-visual learning helps to comprehensively understand the world, by integrating
different senses. Accordingly, multiple input modalities are expected to boost model …

Salva Cita Citato da 213 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]

[PDF] thecvf.com

Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Salva Cita Citato da 105 Articoli correlati Tutte e 12 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Combining residual networks with LSTMs for lipreading

T Stafylakis, G Tzimiropoulos - arxiv preprint arxiv:1703.04105, 2017 - arxiv.org

We propose an end-to-end deep learning architecture for word-level visual speech
recognition. The system is a combination of spatiotemporal convolutional, residual and …

Salva Cita Citato da 385 Articoli correlati Tutte e 12 le versioni Versione HTML

[Free GPT-4]

[PDF] academia.edu

[PDF][PDF] Multimodal deep learning.

J Ngiam, A Khosla, M Kim, J Nam, H Lee, AY Ng - ICML, 2011 - academia.edu

Deep networks have been successfully applied to unsupervised feature learning for single
modalities (eg, text, images or audio). In this work, we propose a novel application of deep …

Salva Cita Citato da 4289 Articoli correlati Tutte e 29 le versioni Versione HTML

[Free GPT-4]

[PDF] aaai.org

Partial multi-view clustering

SY Li, Y Jiang, ZH Zhou - Proceedings of the AAAI conference on …, 2014 - ojs.aaai.org

Real data are often with multiple modalities or comingfrom multiple channels, while multi-
view clusteringprovides a natural formulation for generating clustersfrom such data …

Salva Cita Citato da 474 Articoli correlati Tutte e 11 le versioni Versione HTML

Multimodal sparse transformer network for audio-visual speech recognition

Q Song, B Sun, S Li - IEEE Transactions on Neural Networks …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) is the major human–machine interface in many
intelligent systems, such as intelligent homes, autonomous driving, and servant robots …

Salva Cita Citato da 81 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] arxiv.org

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arxiv preprint arxiv …, 2018 - arxiv.org

This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Salva Cita Citato da 213 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] aau.dk

Multimodal human–computer interaction: A survey

A Jaimes, N Sebe - Computer vision and image understanding, 2007 - Elsevier

In this paper, we review the major approaches to multimodal human–computer interaction,
giving an overview of the field from a computer vision perspective. In particular, we focus on …

Salva Cita Citato da 1344 Articoli correlati Tutte e 9 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Audio-visual automatic speech recognition: An overview

Attacks and defenses in user authentication systems: A survey

Survey on automatic lip-reading in the era of deep learning

Balanced multimodal learning via on-the-fly gradient modulation

Sub-word level lip reading with visual attention

Combining residual networks with LSTMs for lipreading

[PDF][PDF] Multimodal deep learning.

Partial multi-view clustering

Multimodal sparse transformer network for audio-visual speech recognition

Large-scale visual speech recognition

Multimodal human–computer interaction: A survey