- Academic Search

H Aldarmaki, A Ullah, S Ram, N Zaki - Speech Communication, 2022 - Elsevier

Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …

Uložit Citovat Počet citací tohoto článku: 75 Související články Všechny verze (počet: 8)

[Free GPT-4]

[PDF] neurips.cc

Unsupervised learning of spoken language with visual context

D Harwath, A Torralba, J Glass - Advances in neural …, 2016 - proceedings.neurips.cc

Humans learn to speak before they can read or write, so why can't computers do the same?
In this paper, we present a deep neural network model capable of rudimentary spoken …

Uložit Citovat Počet citací tohoto článku: 299 Související články Všechny verze (počet: 12) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

Jointly discovering visual objects and spoken words from raw sensory input

D Harwath, A Recasens, D Surís… - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we explore neural network models that learn to associate segments of spoken
audio captions with the semantically relevant portions of natural images that they refer to …

Uložit Citovat Počet citací tohoto článku: 249 Související články Všechny verze (počet: 17) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Effectiveness of self-supervised pre-training for speech recognition

A Baevski, M Auli, A Mohamed - arxiv preprint arxiv:1911.03912, 2019 - arxiv.org

We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …

Uložit Citovat Počet citací tohoto článku: 154 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Tied multitask learning for neural speech translation

A Anastasopoulos, D Chiang - arxiv preprint arxiv:1802.06655, 2018 - arxiv.org

We explore multitask models for neural translation of speech, augmenting them in order to
reflect two intuitive notions. First, we introduce a model where the second task decoder …

Uložit Citovat Počet citací tohoto článku: 194 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

S Bansal, H Kamper, K Livescu, A Lopez… - arxiv preprint arxiv …, 2018 - arxiv.org

We present a simple approach to improve direct speech-to-text translation (ST) when the
source language is low-resource: we pre-train the model on a high-resource automatic …

Uložit Citovat Počet citací tohoto článku: 215 Související články Všechny verze (počet: 8) Zobrazit jako HTML

Effectiveness of self-supervised pre-training for asr

A Baevski, A Mohamed - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org

We compare self-supervised representation learning algorithms which either explicitly
quantize the audio data or learn representations without quantization. We find the former to …

Uložit Citovat Počet citací tohoto článku: 122 Související články

[Free GPT-4]

[PDF] iitkgp.ac.in

Recent developments in spoken term detection: a survey

A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer

Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …

Uložit Citovat Počet citací tohoto článku: 76 Související články Všechny verze (počet: 7)

[Free GPT-4]

[PDF] arxiv.org

Word discovery in visually grounded, self-supervised speech models

P Peng, D Harwath - arxiv preprint arxiv:2203.15081, 2022 - arxiv.org

We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …

Uložit Citovat Počet citací tohoto článku: 50 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Learning hierarchical discrete linguistic units from visually-grounded speech

D Harwath, WN Hsu, J Glass - arxiv preprint arxiv:1911.09602, 2019 - arxiv.org

In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …

Uložit Citovat Počet citací tohoto článku: 105 Související články Všechny verze (počet: 8) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Towards spoken term discovery at scale with zero resources.

[HTML][HTML] Unsupervised automatic speech recognition: A review

Unsupervised learning of spoken language with visual context

Jointly discovering visual objects and spoken words from raw sensory input

Effectiveness of self-supervised pre-training for speech recognition

Tied multitask learning for neural speech translation

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Effectiveness of self-supervised pre-training for asr

Recent developments in spoken term detection: a survey

Word discovery in visually grounded, self-supervised speech models

Learning hierarchical discrete linguistic units from visually-grounded speech