Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation

R Jiao, Y Zhang, L Ding, B Xue, J Zhang, R Cai… - Computers in Biology …, 2024 - Elsevier
Medical image segmentation is a fundamental and critical step in many image-guided
clinical approaches. Recent success of deep learning-based segmentation methods usually …

A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D **n, D Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …

Ten years of generative adversarial nets (GANs): a survey of the state-of-the-art

T Chakraborty, UR KS, SM Naik, M Panja… - Machine Learning …, 2024 - iopscience.iop.org
Generative adversarial networks (GANs) have rapidly emerged as powerful tools for
generating realistic and diverse data across various domains, including computer vision and …

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction

Y Huang, W Zheng, Y Zhang, J Zhou, J Lu - European Conference on …, 2024 - Springer
Abstract 3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and
semantics of the surrounding scene and is an important task for the robustness of vision …

LowRankOcc: tensor decomposition and low-rank recovery for vision-based 3D semantic occupancy prediction

L Zhao, X Xu, Z Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we present a tensor decomposition and low-rank recovery approach
(LowRankOcc) for vision-based 3D semantic occupancy prediction. Conventional methods …