Google Академія

Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers.

S Bengesi, H El-Sayed, MK Sarker, Y Houkpati… - IEEe …, 2024 - ieeexplore.ieee.org

The launch of ChatGPT in 2022 garnered global attention, marking a significant milestone in
the Generative Artificial Intelligence (GAI) field. While GAI has been in effect for the past …

Зберегти Послатися Цитовано в 132 джерелах Пов’язані статті Кількість версій: 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Promises and challenges of generative artificial intelligence for human learning

L Yan, S Greiff, Z Teuber, D Gašević - Nature Human Behaviour, 2024 - nature.com

Generative artificial intelligence (GenAI) holds the potential to transform the delivery,
cultivation and evaluation of human learning. Here the authors examine the integration of …

Зберегти Послатися Цитовано в 29 джерелах Пов’язані статті Кількість версій: 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Зберегти Послатися Цитовано в 2867 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, instruction-following audio-language models have received broad attention for
audio interaction with humans. However, the absence of pre-trained audio models capable …

Зберегти Послатися Цитовано в 237 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Зберегти Послатися Цитовано в 229 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D **n, D Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Зберегти Послатися Цитовано в 145 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding diffusion objectives as the elbo with simple data augmentation

D Kingma, R Gao - Advances in Neural Information …, 2023 - proceedings.neurips.cc

To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized
with objectives that typically look very different from the maximum likelihood and the …

Зберегти Послатися Цитовано в 106 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seamless: Multilingual Expressive and Streaming Speech Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arxiv preprint arxiv …, 2023 - arxiv.org

Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …

Зберегти Послатися Цитовано в 109 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Xtts: a massively multilingual zero-shot text-to-speech model

E Casanova, K Davis, E Gölge, G Göknar… - arxiv preprint arxiv …, 2024 - arxiv.org

Most Zero-shot Multi-speaker TTS (ZS-TTS) systems support only a single language.
Although models like YourTTS, VALL-E X, Mega-TTS 2, and Voicebox explored Multilingual …

Зберегти Послатися Цитовано в 71 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …

Зберегти Послатися Цитовано в 59 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Voicebox: Text-guided multilingual universal speech generation at scale

Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers.

Promises and challenges of generative artificial intelligence for human learning

The llama 3 herd of models

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Understanding diffusion objectives as the elbo with simple data augmentation

Seamless: Multilingual Expressive and Streaming Speech Translation

Xtts: a massively multilingual zero-shot text-to-speech model

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers