Demand side management through load shifting in IoT based HEMS: Overview, challenges and opportunities

S Sharda, M Singh, K Sharma - Sustainable Cities and Society, 2021 - Elsevier
In smart grid era, demand side management (DSM) plays an indispensable role in
development of sustainable cities and societies. This paper presents practical challenges …

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W **, B Ginsburg, B Catanzaro… - arxiv preprint arxiv …, 2022 - arxiv.org
Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

Auto-avsr: Audio-visual speech recognition with automatic labels

P Ma, A Haliassos, A Fernandez-Lopez… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Audio-visual speech recognition has received a lot of attention due to its robustness against
acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech …

Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions

S Kriman, S Beliaev, B Ginsburg… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We propose a new end-to-end neural acoustic model for automatic speech recognition. The
model is composed of multiple blocks with residual connections between them. Each block …

Generative spoken dialogue language modeling

TA Nguyen, E Kharitonov, J Copet, Y Adi… - Transactions of the …, 2023 - direct.mit.edu
We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic
spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with …

Guided-tts: A diffusion model for text-to-speech via classifier guidance

H Kim, S Kim, S Yoon - International Conference on …, 2022 - proceedings.mlr.press
We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require
any transcript of target speaker using classifier guidance. Guided-TTS combines an …

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arxiv preprint arxiv …, 2020 - arxiv.org
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …