- Academic Search

P Peng, D Harwath - arxiv preprint arxiv:2203.15081, 2022 - arxiv.org

We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …

Zapisz Cytuj Cytowane przez 50 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Zapisz Cytuj Cytowane przez 18 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …

Zapisz Cytuj Cytowane przez 34 Powiązane artykuły Wszystkie wersje 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org

Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …

Zapisz Cytuj Cytowane przez 33 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arxiv preprint arxiv:2307.00162, 2023 - arxiv.org

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Zapisz Cytuj Cytowane przez 12 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc

The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

Zapisz Cytuj Cytowane przez 21 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

DP-Parse: Finding word boundaries from raw speech with an instance lexicon

R Algayres, T Ricoul, J Karadayi… - Transactions of the …, 2022 - direct.mit.edu

Finding word boundaries in continuous speech is challenging as there is little or no
equivalent of a 'space'delimiter between words. Popular Bayesian non-parametric models …

Zapisz Cytuj Cytowane przez 17 Powiązane artykuły Wszystkie wersje 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

P Peng, SW Li, O Räsänen, A Mohamed… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we show that representations capturing syllabic units emerge when training a
self-supervised speech model with a visually-grounded training objective. We demonstrate …

Zapisz Cytuj Cytowane przez 6 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unsupervised word segmentation using k nearest neighbors

TS Fuchs, Y Hoshen, J Keshet - arxiv preprint arxiv:2204.13094, 2022 - arxiv.org

In this paper, we propose an unsupervised kNN-based approach for word segmentation in
speech utterances. Our method relies on self-supervised pre-trained speech …

Zapisz Cytuj Cytowane przez 9 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

C Jacobs, H Kamper - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech
segments that encode phonetic content so that different realisations of the same word have …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 4

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes...

Word discovery in visually grounded, self-supervised speech models

What do self-supervised speech models know about words?

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

What do self-supervised speech models know about words?

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

DP-Parse: Finding word boundaries from raw speech with an instance lexicon

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

Unsupervised word segmentation using k nearest neighbors

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings