Google Наука

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arxiv preprint arxiv …, 2024 - arxiv.org

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

Запазване Позоваване С позовавания в 29 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Запазване Позоваване С позовавания в 13 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Givt: Generative infinite-vocabulary transformers

M Tschannen, C Eastwood, F Mentzer - European Conference on …, 2024 - Springer

Abstract We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate
vector sequences with real-valued entries, instead of discrete tokens from a finite …

Запазване Позоваване С позовавания в 31 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Запазване Позоваване С позовавания в 37 Сродни статии Всички 5 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codec-SUPERB: An in-depth analysis of sound codec models

H Wu, HL Chung, YC Lin, YK Wu, X Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …

Запазване Позоваване С позовавания в 16 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech

C Du, Y Guo, H Wang, Y Yang, Z Niu, S Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and
VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot …

Запазване Позоваване С позовавания в 22 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive diffusion transformer for text-to-speech synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org

Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

Запазване Позоваване С позовавания в 14 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mini-omni2: Towards open-source gpt-4o with vision, speech and duplex capabilities

Z **e, C Wu - arxiv preprint arxiv:2410.11190, 2024 - arxiv.org

GPT-4o, an all-encompassing model, represents a milestone in the development of large
multi-modal language models. It can understand visual, auditory, and textual modalities …

Запазване Позоваване С позовавания в 20 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Boosting large language model for speech synthesis: An empirical study

H Hao, L Zhou, S Liu, J Li, S Hu, R Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have made significant advancements in natural language
processing and are concurrently extending the language ability to other modalities, such as …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 3 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Lauragpt: Listen, attend, understand, and regenerate audio with gpt

Towards audio language modeling--an overview

Foundation models for music: A survey

Givt: Generative infinite-vocabulary transformers

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Llms meet multimodal generation and editing: A survey

Codec-SUPERB: An in-depth analysis of sound codec models

Vall-t: Decoder-only generative transducer for robust and decoding-controllable text-to-speech

Autoregressive diffusion transformer for text-to-speech synthesis

Mini-omni2: Towards open-source gpt-4o with vision, speech and duplex capabilities

Boosting large language model for speech synthesis: An empirical study