Funasr: A fundamental end-to-end speech recognition toolkit
This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …
bridge the gap between academic research and industrial applications. FunASR offers …
Redefining industry 5.0 in ophthalmology and digital metrology: a global perspective
The demand for ophthalmic diagnosis and monitoring equipment is high due to day-by-day
increasing eye-related diseases. These diseases are growing very fast due to changes in …
increasing eye-related diseases. These diseases are growing very fast due to changes in …
Robust acoustic and semantic contextual biasing in neural transducers for speech recognition
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …
Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers
Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …
applications. Such architectures are often characterized by their causality. Causal …
Interformer: Interactive local and global features fusion for automatic speech recognition
The local and global features are both essential for automatic speech recognition (ASR).
Many recent methods have verified that simply combining local and global features can …
Many recent methods have verified that simply combining local and global features can …
[HTML][HTML] Conmer: Streaming Conformer without self-attention for interactive voice assistants
Conformer is an extension of transformer-based neural ASR models whose fundamental
component is the selfattention module. In this paper, we show that we can remove the self …
component is the selfattention module. In this paper, we show that we can remove the self …
Exploring rwkv for memory efficient and low latency streaming asr
Recently, self-attention-based transformers and conformers have been introduced as
alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention …
alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention …
Grammar-supervised end-to-end speech recognition with part-of-speech tagging and dependency parsing
For most automatic speech recognition systems, many unacceptable hypothesis errors still
make the recognition results absurd and difficult to understand. In this paper, we introduce …
make the recognition results absurd and difficult to understand. In this paper, we introduce …
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language
Identification (LID) that is based on the ContextNet architecture. TitaNet-LID employs 1D …
Identification (LID) that is based on the ContextNet architecture. TitaNet-LID employs 1D …
Dual-attention neural transducers for efficient wake word spotting in speech recognition
We present dual-attention neural biasing, an architecture designed to boost Wake Words
(WW) recognition and improve inference time latency on speech recognition tasks. This …
(WW) recognition and improve inference time latency on speech recognition tasks. This …