Attacks and defenses in user authentication systems: A survey

X Wang, Z Yan, R Zhang, P Zhang - Journal of Network and Computer …, 2021 - Elsevier
User authentication systems (in short authentication systems) have wide utilization in our
daily life. Unfortunately, existing authentication systems are prone to various attacks while …

Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier
In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

Balanced multimodal learning via on-the-fly gradient modulation

X Peng, Y Wei, A Deng, D Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Audio-visual learning helps to comprehensively understand the world, by integrating
different senses. Accordingly, multiple input modalities are expected to boost model …

Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Combining residual networks with LSTMs for lipreading

T Stafylakis, G Tzimiropoulos - arxiv preprint arxiv:1703.04105, 2017 - arxiv.org
We propose an end-to-end deep learning architecture for word-level visual speech
recognition. The system is a combination of spatiotemporal convolutional, residual and …

[PDF][PDF] Multimodal deep learning.

J Ngiam, A Khosla, M Kim, J Nam, H Lee, AY Ng - ICML, 2011 - academia.edu
Deep networks have been successfully applied to unsupervised feature learning for single
modalities (eg, text, images or audio). In this work, we propose a novel application of deep …

Partial multi-view clustering

SY Li, Y Jiang, ZH Zhou - Proceedings of the AAAI conference on …, 2014 - ojs.aaai.org
Real data are often with multiple modalities or comingfrom multiple channels, while multi-
view clusteringprovides a natural formulation for generating clustersfrom such data …

Multimodal sparse transformer network for audio-visual speech recognition

Q Song, B Sun, S Li - IEEE Transactions on Neural Networks …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) is the major human–machine interface in many
intelligent systems, such as intelligent homes, autonomous driving, and servant robots …

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arxiv preprint arxiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Multimodal human–computer interaction: A survey

A Jaimes, N Sebe - Computer vision and image understanding, 2007 - Elsevier
In this paper, we review the major approaches to multimodal human–computer interaction,
giving an overview of the field from a computer vision perspective. In particular, we focus on …