- Academic Search

P Peng, D Harwath - arxiv preprint arxiv:2203.15081, 2022 - arxiv.org

We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …

Simpan Kutip Dirujuk 50 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Simpan Kutip Dirujuk 18 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Phone-to-audio alignment without text: A semi-supervised approach

J Zhu, C Zhang, D Jurgens - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

The task of phone-to-audio alignment has many applications in speech research. Here we
introduce two Wav2Vec2-based models for both text-dependent and text-independent …

Simpan Kutip Dirujuk 58 kali Artikel terkait 5 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …

Simpan Kutip Dirujuk 34 kali Artikel terkait 7 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org

Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …

Simpan Kutip Dirujuk 33 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arxiv preprint arxiv …, 2022 - arxiv.org

Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …

Simpan Kutip Dirujuk 13 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arxiv preprint arxiv:2307.00162, 2023 - arxiv.org

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Simpan Kutip Dirujuk 12 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc

The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

Simpan Kutip Dirujuk 21 kali Artikel terkait 9 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient transformers with dynamic token pooling

P Nawrot, J Chorowski, A Łańcucki… - arxiv preprint arxiv …, 2022 - arxiv.org

Transformers achieve unrivalled performance in modelling language, but remain inefficient
in terms of memory and time complexity. A possible remedy is to reduce the sequence …

Simpan Kutip Dirujuk 30 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On compressing sequences for self-supervised speech models

Y Meng, HJ Chen, J Shi, S Watanabe… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …

Simpan Kutip Dirujuk 16 kali Artikel terkait 5 versi

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Segmental contrastive predictive coding for unsupervised word segmentation

Word discovery in visually grounded, self-supervised speech models

What do self-supervised speech models know about words?

Phone-to-audio alignment without text: A semi-supervised approach

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

A brief overview of unsupervised neural speech representation learning

What do self-supervised speech models know about words?

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

Efficient transformers with dynamic token pooling

On compressing sequences for self-supervised speech models