Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces

A Coucke, A Saade, A Ball, T Bluche, A Caulier… - arxiv preprint arxiv …, 2018 - arxiv.org
This paper presents the machine learning architecture of the Snips Voice Platform, a
software solution to perform Spoken Language Understanding on microprocessors typical of …

Multiple classifiers in biometrics. Part 2: Trends and challenges

J Fierrez, A Morales, R Vera-Rodriguez, D Camacho - Information Fusion, 2018 - Elsevier
The present paper is Part 2 in this series of two papers. In Part 1 we provided an introduction
to Multiple Classifier Systems (MCS) with a focus into the fundamentals: basic nomenclature …

Dynamic adaptive DNN surgery for inference acceleration on the edge

C Hu, W Bao, D Wang, F Liu - IEEE INFOCOM 2019-IEEE …, 2019 - ieeexplore.ieee.org
Recent advances in deep neural networks (DNNs) have substantially improved the accuracy
and speed of a variety of intelligent applications. Nevertheless, one obstacle is that DNN …

Wenet 2.0: More productive end-to-end speech recognition toolkit

B Zhang, D Wu, Z Peng, X Song, Z Yao, H Lv… - arxiv preprint arxiv …, 2022 - arxiv.org
Recently, we made available WeNet, a production-oriented end-to-end speech recognition
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …

Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding

J Zuluaga-Gomez, I Nigmatulina, A Prasad, P Motlicek… - Aerospace, 2023 - mdpi.com
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …

Deep context: end-to-end contextual speech recognition

G Pundak, TN Sainath, R Prabhavalkar… - 2018 IEEE spoken …, 2018 - ieeexplore.ieee.org
In automatic speech recognition (ASR) what a user says depends on the particular context
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …

Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …

[PDF][PDF] Shallow-Fusion End-to-End Contextual Biasing.

D Zhao, TN Sainath, D Rybach, P Rondon, D Bhatia… - Interspeech, 2019 - isca-archive.org
Contextual biasing to a specific domain, including a user's song names, app names and
contact names, is an important component of any production-level automatic speech …

Two-pass end-to-end speech recognition

TN Sainath, R Pang, D Rybach, Y He… - arxiv preprint arxiv …, 2019 - arxiv.org
The requirements for many applications of state-of-the-art speech recognition systems
include not only low word error rate (WER) but also low latency. Specifically, for many use …