Академия Google

Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation

R Jiao, Y Zhang, L Ding, B Xue, J Zhang, R Cai… - Computers in Biology …, 2024 - Elsevier

Medical image segmentation is a fundamental and critical step in many image-guided
clinical approaches. Recent success of deep learning-based segmentation methods usually …

Сохранить Цитировать Цитируется: 190 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org

Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Сохранить Цитировать Цитируется: 102 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Сохранить Цитировать Цитируется: 226 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D **n, D Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Сохранить Цитировать Цитируется: 143 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Сохранить Цитировать Цитируется: 108 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …

Сохранить Цитировать Цитируется: 55 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] iop.org

Ten years of generative adversarial nets (GANs): a survey of the state-of-the-art

T Chakraborty, UR KS, SM Naik, M Panja… - Machine Learning …, 2024 - iopscience.iop.org

Generative adversarial networks (GANs) have rapidly emerged as powerful tools for
generating realistic and diverse data across various domains, including computer vision and …

Сохранить Цитировать Цитируется: 70 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Сохранить Цитировать Цитируется: 69 Похожие статьи Все версии статьи (5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction

Y Huang, W Zheng, Y Zhang, J Zhou, J Lu - European Conference on …, 2024 - Springer

Abstract 3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and
semantics of the surrounding scene and is an important task for the robustness of vision …

Сохранить Цитировать Цитируется: 28 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

LowRankOcc: tensor decomposition and low-rank recovery for vision-based 3D semantic occupancy prediction

L Zhao, X Xu, Z Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we present a tensor decomposition and low-rank recovery approach
(LowRankOcc) for vision-based 3D semantic occupancy prediction. Conventional methods …

Сохранить Цитировать Цитируется: 13 Похожие статьи Все версии статьи (4) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Symphonize 3d semantic scene completion with contextual instance queries

Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation

A comprehensive survey on segment anything model for vision and beyond

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Uniaudio: An audio foundation model toward universal audio generation

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

Ten years of generative adversarial nets (GANs): a survey of the state-of-the-art

Speechx: Neural codec language model as a versatile speech transformer

Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction

LowRankOcc: tensor decomposition and low-rank recovery for vision-based 3D semantic occupancy prediction