Google Acadèmic

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Desa Cita Citat per 406 Articles relacionats Totes les 10 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised fine-tuning for improved content representations by speaker-invariant clustering

HJ Chang, AH Liu, J Glass - arxiv preprint arxiv:2305.11072, 2023 - arxiv.org

Self-supervised speech representation models have succeeded in various tasks, but
improving them for content-related problems using unlabeled data is challenging. We …

Desa Cita Citat per 17 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[HTML] amazon.science

Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition

DM Chan, S Ghosh, A Rastrow… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Despite improvements to the generalization performance of automated speech recognition
(ASR) models, specializing ASR models for downstream tasks remains a challenging task …

Desa Cita Citat per 12 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

CCSRD: Content-centric speech representation disentanglement learning for end-to-end speech translation

X Zhao, H Sun, Y Lei, S Zhu… - Findings of the Association …, 2023 - aclanthology.org

Deep neural networks have demonstrated their capacity in extracting features from speech
inputs. However, these features may include non-linguistic speech factors such as timbre …

Desa Cita Citat per 5 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Representation Purification for End-to-End Speech Translation

C Zhang, Y Zhou, R Zhao, Y Chen, X Shi - arxiv preprint arxiv:2412.04266, 2024 - arxiv.org

Speech-to-text translation (ST) is a cross-modal task that involves converting spoken
language into text in a different language. Previous research primarily focused on …

Desa Cita Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

HJ Chang, J Glass - arxiv preprint arxiv:2311.09117, 2023 - arxiv.org

This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-
supervision method for speaker and noise-invariant speech representations by learning …

Desa Cita Citat per 1 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] proquest.com

Perturbation-invariant Speech Representation Learning by Online Clustering

HJ Chang - 2024 - search.proquest.com

Despite success across various tasks, self-supervised speech models face significant
challenges in enhancing content-related performance with unlabeled data, requiring …

Desa Cita Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-stage multi-modal pre-training for automatic speech recognition

Y Jain, D Chan, P Dheram, A Khare… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advances in machine learning have demonstrated that multi-modal pre-training can
improve automatic speech recognition (ASR) performance compared to randomly initialized …

Desa Cita Citat per 2 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Task oriented dialogue as a catalyst for self-supervised automatic speech recognition

DM Chan, S Ghosh, H Tulsiani… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

While word error rates of automatic speech recognition (ASR) systems have consistently
fallen, natural language understanding (NLU) applications built on top of ASR systems still …

Desa Cita Citat per 1 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek

[LLIBRE][B] Understanding, Building, and Evaluating Models for Context Aware Conditional Natural Language Generation

DM Chan - 2024 - search.proquest.com

If you ask a human to describe an image, they might do so in a thousand different ways.
Each of these descriptions depends not only on the image but also on a rich tapestry of …

Desa Cita Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Cerca de biblioteques

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Content-context factorized representations for automated speech recognition

Self-supervised speech representation learning: A review

Self-supervised fine-tuning for improved content representations by speaker-invariant clustering

Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition

CCSRD: Content-centric speech representation disentanglement learning for end-to-end speech translation

Representation Purification for End-to-End Speech Translation

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

Perturbation-invariant Speech Representation Learning by Online Clustering

Multi-stage multi-modal pre-training for automatic speech recognition

Task oriented dialogue as a catalyst for self-supervised automatic speech recognition

[LLIBRE][B] Understanding, Building, and Evaluating Models for Context Aware Conditional Natural Language Generation