Demand side management through load shifting in IoT based HEMS: Overview, challenges and opportunities
In smart grid era, demand side management (DSM) plays an indispensable role in
development of sustainable cities and societies. This paper presents practical challenges …
development of sustainable cities and societies. This paper presents practical challenges …
Robust speech recognition via large-scale weak supervision
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
Bigvgan: A universal neural vocoder with large-scale training
Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …
the model generates raw waveform conditioned on acoustic features, it is challenging to …
Torchaudio: Building blocks for audio and speech processing
This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …
applications in the audio and speech processing domain. The objective of TorchAudio is to …
Auto-avsr: Audio-visual speech recognition with automatic labels
Audio-visual speech recognition has received a lot of attention due to its robustness against
acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech …
acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech …
Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions
We propose a new end-to-end neural acoustic model for automatic speech recognition. The
model is composed of multiple blocks with residual connections between them. Each block …
model is composed of multiple blocks with residual connections between them. Each block …
Generative spoken dialogue language modeling
We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic
spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with …
spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with …
Guided-tts: A diffusion model for text-to-speech via classifier guidance
We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require
any transcript of target speaker using classifier guidance. Guided-TTS combines an …
any transcript of target speaker using classifier guidance. Guided-TTS combines an …
ESPnet-ST: All-in-one speech translation toolkit
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …