Simda: Simple diffusion adapter for efficient video generation

Z **ng, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Svformer: Semi-supervised video transformer for action recognition

Z **ng, Q Dai, H Hu, J Chen, Z Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Semi-supervised action recognition is a challenging but critical task due to the high cost of
video annotations. Existing approaches mainly use convolutional neural networks, yet …

XVO: Generalized visual odometry via cross-modal self-training

L Lai, Z Shangguan, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …

Panoswin: a pano-style swin transformer for panorama understanding

Z Ling, Z **ng, X Zhou, M Cao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In panorama understanding, the widely used equirectangular projection (ERP) entails
boundary discontinuity and spatial distortion. It severely deteriorates the conventional CNNs …

Few-shot single-view 3d reconstruction with memory prior contrastive network

Z **ng, Y Chen, Z Ling, X Zhou, Y **ang - European Conference on …, 2022 - Springer
Abstract 3D reconstruction of novel categories based on few-shot learning is appealing in
real-world applications and attracts increasing research interests. Previous approaches …

Chasing consistency in text-to-3d generation from a single image

Y Ouyang, W Chai, J Ye, D Tao, Y Zhan… - arxiv preprint arxiv …, 2023 - arxiv.org
Text-to-3D generation from a single-view image is a popular but challenging task in 3D
vision. Although numerous methods have been proposed, existing works still suffer from the …

Vidiff: Translating videos via multi-modal instructions with diffusion models

Z **ng, Q Dai, Z Zhang, H Zhang, H Hu, Z Wu… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models have achieved significant success in image and video generation. This
motivates a growing interest in video editing tasks, where videos are edited according to …

Garnet: Global-aware multi-view 3d reconstruction network and the cost-performance tradeoff

Z Zhu, L Yang, X Lin, L Yang, Y Liang - Pattern Recognition, 2023 - Elsevier
Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At
present, the mainstream solutions adopt different ways to fusion the features from several …

Umiformer: Mining the correlations between similar tokens for multi-view 3d reconstruction

Z Zhu, L Yang, N Li, C Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In recent years, many video tasks have achieved breakthroughs by utilizing the vision
transformer and establishing spatial-temporal decoupling for feature extraction. Although …

Fdgaussian: Fast gaussian splatting from single image via geometric-aware diffusion model

Q Feng, Z **ng, Z Wu, YG Jiang - arxiv preprint arxiv:2403.10242, 2024 - arxiv.org
Reconstructing detailed 3D objects from single-view images remains a challenging task due
to the limited information available. In this paper, we introduce FDGaussian, a novel two …