Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …
Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces
This paper presents the machine learning architecture of the Snips Voice Platform, a
software solution to perform Spoken Language Understanding on microprocessors typical of …
software solution to perform Spoken Language Understanding on microprocessors typical of …
Multiple classifiers in biometrics. Part 2: Trends and challenges
The present paper is Part 2 in this series of two papers. In Part 1 we provided an introduction
to Multiple Classifier Systems (MCS) with a focus into the fundamentals: basic nomenclature …
to Multiple Classifier Systems (MCS) with a focus into the fundamentals: basic nomenclature …
Dynamic adaptive DNN surgery for inference acceleration on the edge
Recent advances in deep neural networks (DNNs) have substantially improved the accuracy
and speed of a variety of intelligent applications. Nevertheless, one obstacle is that DNN …
and speed of a variety of intelligent applications. Nevertheless, one obstacle is that DNN …
Wenet 2.0: More productive end-to-end speech recognition toolkit
Recently, we made available WeNet, a production-oriented end-to-end speech recognition
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …
Lessons learned in transcribing 5000 h of air traffic control communications for robust automatic speech understanding
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring
safe and efficient air traffic control (ATC). The handling of these voice communications …
safe and efficient air traffic control (ATC). The handling of these voice communications …
Deep context: end-to-end contextual speech recognition
In automatic speech recognition (ASR) what a user says depends on the particular context
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …
she is in. Typically, this context is represented as a set of word n-grams. In this work, we …
Speech processing for digital home assistants: Combining signal processing with deep-learning techniques
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …
home assistants with a spoken language interface have become a ubiquitous commodity …
[PDF][PDF] Shallow-Fusion End-to-End Contextual Biasing.
Contextual biasing to a specific domain, including a user's song names, app names and
contact names, is an important component of any production-level automatic speech …
contact names, is an important component of any production-level automatic speech …
Two-pass end-to-end speech recognition
The requirements for many applications of state-of-the-art speech recognition systems
include not only low word error rate (WER) but also low latency. Specifically, for many use …
include not only low word error rate (WER) but also low latency. Specifically, for many use …