- Academic Search

P Guo, F Boyer, X Chang, T Hayashi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …

Enregistrer Citer Cité 304 fois Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Split computing and early exiting for deep learning applications: Survey and research challenges

Y Matsubara, M Levorato, F Restuccia - ACM Computing Surveys, 2022 - dl.acm.org

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep
neural networks (DNNs) to execute complex inference tasks such as image classification …

Enregistrer Citer Cité 230 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] jmlr.org

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Enregistrer Citer Cité 293 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Enregistrer Citer Cité 623 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] nature.com

A high-performance speech neuroprosthesis

FR Willett, EM Kunz, C Fan, DT Avansino, GH Wilson… - Nature, 2023 - nature.com

Speech brain–computer interfaces (BCIs) have the potential to restore rapid communication
to people with paralysis by decoding neural activity evoked by attempted speech into text, or …

Enregistrer Citer Cité 308 fois Autres articles Les 16 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Enregistrer Citer Cité 244 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Enregistrer Citer Cité 986 fois Autres articles Les 13 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Masked autoencoders that listen

PY Huang, H Xu, J Li, A Baevski… - Advances in …, 2022 - proceedings.neurips.cc

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …

Enregistrer Citer Cité 254 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Enregistrer Citer Cité 747 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc

Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

Enregistrer Citer Cité 331 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

The Kaldi speech recognition toolkit

Recent developments on espnet toolkit boosted by conformer

Split computing and early exiting for deep learning applications: Survey and research challenges

Scaling speech technology to 1,000+ languages

Neural codec language models are zero-shot text to speech synthesizers

A high-performance speech neuroprosthesis

Voicebox: Text-guided multilingual universal speech generation at scale

Ego4d: Around the world in 3,000 hours of egocentric video

Masked autoencoders that listen

SpeechBrain: A general-purpose speech toolkit

Unsupervised speech recognition