A comparison of transformer, convolutional, and recurrent neural networks on phoneme recognition

K Shim, W Sung - arxiv preprint arxiv:2210.00367, 2022 - arxiv.org
Phoneme recognition is a very important part of speech recognition that requires the ability
to extract phonetic features from multiple frames. In this paper, we compare and analyze …

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

K Shim, J Lee, S Chang, K Hwang - arxiv preprint arxiv:2308.16415, 2023 - arxiv.org
Streaming automatic speech recognition (ASR) models are restricted from accessing future
context, which results in worse performance compared to the non-streaming models. To …

Asbert: Asr-specific self-supervised learning with self-training

HY Kim, BY Kim, SW Yoo, Y Lim… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Pre-training of self-supervised learning (SSL) generally shows a good performance on
various speech processing tasks. However, this pre-training scheme may lead to a sub …

[PDF][PDF] Self-training ASR Guided by Unsupervised ASR Teacher

HY Kim, BY Kim, Y Lim, J Park, S Choi, Y Ju… - Proc. Interspeech …, 2024 - isca-archive.org
Self-training has gained increasing attention due to its notable performance improvement in
speech recognition. However, conventional self-training techniques have two key …

Gain Cell-Based Analog Content Addressable Memory for Dynamic Associative tasks in AI

PP Manea, N Leroux, E Neftci, JP Strachan - arxiv preprint arxiv …, 2024 - arxiv.org
Analog Content Addressable Memories (aCAMs) have proven useful for associative in-
memory computing applications like Decision Trees, Finite State Machines, and Hyper …

Masked token similarity transfer for compressing transformer-based asr models

E Choi, Y Lim, BY Kim, HY Kim, H Lee… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recent self-supervised automatic speech recognition (ASR) models based on transformers
are showing best performance, but their footprint is too large to be trained on low-resource …

[PDF][PDF] Automatic Speech Recognition Transformer with Global Contextual Information Decoder

Y Qian, X Zhuang, M Wang - Proc. Interspeech 2023, 2023 - isca-archive.org
Most current automatic speech recognition (ASR) models use decoders that do not have
access to global contextual information at the token level. Therefore, we propose a decoder …

Dvsa: A Focused and Efficient Sparse Attention Via Explicit Selection for Speech Recognition

M Zhang, J Song, F **e, K Shi, Z Guo… - Available at SSRN … - papers.ssrn.com
Self-attention (SA) is an integral part of the Transformer neural networks, originally
demonstrated its powerful ability in handling text sequences in machine translation tasks …