- Academic Search

SL Metzger, KT Littlejohn, AB Silva, DA Moses… - Nature, 2023 - nature.com

Speech neuroprostheses have the potential to restore communication to people living with
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …

[Free GPT-4]

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

保存引用被引用数: 692 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Audiogen: Textually guided audio generation

F Kreuk, G Synnaeve, A Polyak, U Singer… - arxiv preprint arxiv …, 2022 - arxiv.org

We tackle the problem of generating audio samples conditioned on descriptive text captions.
In this work, we propose AaudioGen, an auto-regressive generative model that generates …

保存引用被引用数: 342 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

保存引用被引用数: 12 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Scaling laws for generative mixed-modal language models

A Aghajanyan, L Yu, A Conneau… - International …, 2023 - proceedings.mlr.press

Generative language models define distributions over sequences of tokens that can
represent essentially any combination of data modalities (eg, any permutation of image …

保存引用被引用数: 84 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Textually pretrained speech language models

M Hassid, T Remez, TA Nguyen, I Gat… - Advances in …, 2024 - proceedings.neurips.cc

Speech language models (SpeechLMs) process and generate acoustic data only, without
textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using …

保存引用被引用数: 55 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mit.edu

SpiRit-LM: Interleaved Spoken and Written Language Model

TA Nguyen, B Muller, B Yu, MR Costa-Jussa… - Transactions of the …, 2025 - direct.mit.edu

We introduce SpiRit-lm, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a 7B pretrained text language model that we extend to the …

保存引用被引用数: 27 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

保存引用被引用数: 107 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

保存引用被引用数: 101 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

CVSS corpus and massively multilingual speech-to-speech translation

Y Jia, MT Ramanovich, Q Wang, H Zen - arxiv preprint arxiv:2201.03713, 2022 - arxiv.org

We introduce CVSS, a massively multilingual-to-English speech-to-speech translation
(S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English …

保存引用被引用数: 78 関連記事全 7 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Direct speech-to-speech translation with discrete units

A high-performance neuroprosthesis for speech decoding and avatar control

High fidelity neural audio compression

Audiogen: Textually guided audio generation

Foundation models for music: A survey

Scaling laws for generative mixed-modal language models

Textually pretrained speech language models

SpiRit-LM: Interleaved Spoken and Written Language Model

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Seamless: Multilingual Expressive and Streaming Speech Translation

CVSS corpus and massively multilingual speech-to-speech translation