Google Академія

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2023 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Зберегти Послатися Цитовано в 261 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Зберегти Послатися Цитовано в 114 джерелах Пов’язані статті Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Textually pretrained speech language models

M Hassid, T Remez, TA Nguyen, I Gat… - Advances in …, 2023 - proceedings.neurips.cc

Speech language models (SpeechLMs) process and generate acoustic data only, without
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …

Зберегти Послатися Цитовано в 61 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Textless speech-to-speech translation on real data

A Lee, H Gong, PA Duquenne, H Schwenk… - arxiv preprint arxiv …, 2021 - arxiv.org

We present a textless speech-to-speech translation (S2ST) system that can translate speech
from one language into another language and can be built without the need of any text data …

Зберегти Послатися Цитовано в 147 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Espnet2-tts: Extending the edge of tts research

T Hayashi, R Yamamoto, T Yoshimura, P Wu… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …

Зберегти Послатися Цитовано в 70 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Improving grammatical error correction with multimodal feature integration

T Fang, J Hu, DF Wong, X Wan, LS Chao… - Findings of the …, 2023 - aclanthology.org

Grammatical error correction (GEC) is a promising task aimed at correcting errors in a text.
Many methods have been proposed to facilitate this task with remarkable results. However …

Зберегти Послатися Цитовано в 19 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speaking style conversion in the waveform domain using discrete self-supervised units

G Maimon, Y Adi - arxiv preprint arxiv:2212.09730, 2022 - arxiv.org

We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …

Зберегти Послатися Цитовано в 22 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] ed.ac.uk

Phonetic analysis of self-supervised representations of english speech

D Wells, H Tang, K Richmond - 23rd Annual Conference of the …, 2022 - research.ed.ac.uk

We present an analysis of discrete units discovered via selfsupervised representation
learning on English speech. We focus on units produced by a pre-trained HuBERT model …

Зберегти Послатися Цитовано в 27 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation

WC Huang, B Peloquin, J Kao, C Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of
source speech to target speech while maintaining translation accuracy. Existing research in …

Зберегти Послатися Цитовано в 19 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling properties of speech language models

S Cuervo, R Marxer - arxiv preprint arxiv:2404.00685, 2024 - arxiv.org

Speech Language Models (SLMs) aim to learn language from raw audio, without textual
resources. Despite significant advances, our current models exhibit weak syntax and …

Зберегти Послатися Цитовано в 5 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

fairseq s^ 2: A scalable and integrable speech synthesis toolkit

Voicebox: Text-guided multilingual universal speech generation at scale

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Textually pretrained speech language models

Textless speech-to-speech translation on real data

Espnet2-tts: Extending the edge of tts research

Improving grammatical error correction with multimodal feature integration

Speaking style conversion in the waveform domain using discrete self-supervised units

Phonetic analysis of self-supervised representations of english speech

A holistic cascade system, benchmark, and human evaluation protocol for expressive speech-to-speech translation

Scaling properties of speech language models