wav2vec: Unsupervised pre-training for speech recognition

S Schneider, A Baevski, R Collobert, M Auli - arxiv preprint arxiv …, 2019 - arxiv.org
We explore unsupervised pre-training for speech recognition by learning representations of
raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting …

Latent backdoor attacks on deep neural networks

Y Yao, H Li, H Zheng, BY Zhao - Proceedings of the 2019 ACM SIGSAC …, 2019 - dl.acm.org
Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs),
where misclassification rules are hidden inside normal models, only to be triggered by very …

Evolutionary transfer optimization-a new frontier in evolutionary computation research

KC Tan, L Feng, M Jiang - IEEE Computational Intelligence …, 2021 - ieeexplore.ieee.org
The evolutionary algorithm (EA) is a nature-inspired population-based search method that
works on Darwinian principles of natural selection. Due to its strong search capability and …

Speech model pre-training for end-to-end spoken language understanding

L Lugosch, M Ravanelli, P Ignoto, VS Tomar… - arxiv preprint arxiv …, 2019 - arxiv.org
Whereas conventional spoken language understanding (SLU) systems map speech to text,
and then text to intent, end-to-end SLU systems map speech directly to intent through a …

Image synthesis under limited data: A survey and taxonomy

M Yang, Z Wang - International Journal of Computer Vision, 2025 - Springer
Deep generative models, which target reproducing the data distribution to produce novel
images, have made unprecedented advancements in recent years. However, one critical …

Deep learning-based late fusion of multimodal information for emotion classification of music video

YR Pandeya, J Lee - Multimedia Tools and Applications, 2021 - Springer
Affective computing is an emerging area of research that aims to enable intelligent systems
to recognize, feel, infer and interpret human emotions. The widely spread online and off-line …

Unispeech: Unified speech representation learning with labeled and unlabeled data

C Wang, Y Wu, Y Qian, K Kumatani… - International …, 2021 - proceedings.mlr.press
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech
representations with both labeled and unlabeled data, in which supervised phonetic CTC …

Rethinking evaluation in ASR: Are our models robust enough?

T Likhomanenko, Q Xu, V Pratap, P Tomasello… - arxiv preprint arxiv …, 2020 - arxiv.org
Is pushing numbers on a single benchmark valuable in automatic speech recognition?
Research results in acoustic modeling are typically evaluated based on performance on a …

With great training comes great vulnerability: Practical attacks against transfer learning

B Wang, Y Yao, B Viswanath, H Zheng… - 27th USENIX security …, 2018 - usenix.org
Transfer learning is a powerful approach that allows users to quickly build accurate deep-
learning (Student) models by" learning" from centralized (Teacher) models pretrained with …

Multilingual speech recognition for Turkic languages

S Mussakhojayeva, K Dauletbek, R Yeshpanov… - Information, 2023 - mdpi.com
The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …