Funasr: A fundamental end-to-end speech recognition toolkit

Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …

Redefining industry 5.0 in ophthalmology and digital metrology: a global perspective

S Chourasia, SM Pandey, Q Murtaza, S Agrawal… - MAPAN, 2023 - Springer
The demand for ophthalmic diagnosis and monitoring equipment is high due to day-by-day
increasing eye-related diseases. These diseases are growing very fast due to changes in …

Robust acoustic and semantic contextual biasing in neural transducers for speech recognition

X Fu, KM Sathyendra, A Gandhe, J Liu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …

Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers

G Strimel, Y **e, BJ King, M Radfar… - International …, 2023 - proceedings.mlr.press
Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …

Interformer: Interactive local and global features fusion for automatic speech recognition

ZH Lai, TH Zhang, Q Liu, X Qian, LF Wei… - arxiv preprint arxiv …, 2023 - arxiv.org
The local and global features are both essential for automatic speech recognition (ASR).
Many recent methods have verified that simply combining local and global features can …

[HTML][HTML] Conmer: Streaming Conformer without self-attention for interactive voice assistants

M Radfar, P Lyskawa, B Trujillo, Y **e, K Zhen… - 2023 - amazon.science
Conformer is an extension of transformer-based neural ASR models whose fundamental
component is the selfattention module. In this paper, we show that we can remove the self …

Exploring rwkv for memory efficient and low latency streaming asr

K An, S Zhang - arxiv preprint arxiv:2309.14758, 2023 - arxiv.org
Recently, self-attention-based transformers and conformers have been introduced as
alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention …

Grammar-supervised end-to-end speech recognition with part-of-speech tagging and dependency parsing

G Wan, T Mao, J Zhang, H Chen, J Gao, Z Ye - Applied Sciences, 2023 - mdpi.com
For most automatic speech recognition systems, many unacceptable hypothesis errors still
make the recognition results absurd and difficult to understand. In this paper, we introduce …

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

F Jia, NR Koluguri, J Balam, B Ginsburg - arxiv preprint arxiv:2210.15781, 2022 - arxiv.org
We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language
Identification (LID) that is based on the ContextNet architecture. TitaNet-LID employs 1D …

Dual-attention neural transducers for efficient wake word spotting in speech recognition

SY Sahai, J Liu, T Muniyappa… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present dual-attention neural biasing, an architecture designed to boost Wake Words
(WW) recognition and improve inference time latency on speech recognition tasks. This …