- Academic Search

Salva Cita Citato da 101 Articoli correlati Tutte e 6 le versioni Versione HTML

Svformer: Semi-supervised video transformer for action recognition

Z **ng, Q Dai, H Hu, J Chen, Z Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Semi-supervised action recognition is a challenging but critical task due to the high cost of
video annotations. Existing approaches mainly use convolutional neural networks, yet …

Salva Cita Citato da 22 Articoli correlati Tutte e 7 le versioni Versione HTML

XVO: Generalized visual odometry via cross-modal self-training

L Lai, Z Shangguan, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …

Salva Cita Citato da 15 Articoli correlati Tutte e 7 le versioni Versione HTML

Panoswin: a pano-style swin transformer for panorama understanding

Z Ling, Z **ng, X Zhou, M Cao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In panorama understanding, the widely used equirectangular projection (ERP) entails
boundary discontinuity and spatial distortion. It severely deteriorates the conventional CNNs …

Salva Cita Citato da 19 Articoli correlati Tutte e 6 le versioni

Few-shot single-view 3d reconstruction with memory prior contrastive network

Z **ng, Y Chen, Z Ling, X Zhou, Y **ang - European Conference on …, 2022 - Springer

Abstract 3D reconstruction of novel categories based on few-shot learning is appealing in
real-world applications and attracts increasing research interests. Previous approaches …

Salva Cita Citato da 11 Articoli correlati Tutte e 2 le versioni Versione HTML

Chasing consistency in text-to-3d generation from a single image

Y Ouyang, W Chai, J Ye, D Tao, Y Zhan… - arxiv preprint arxiv …, 2023 - arxiv.org

Text-to-3D generation from a single-view image is a popular but challenging task in 3D
vision. Although numerous methods have been proposed, existing works still suffer from the …

Salva Cita Citato da 13 Articoli correlati Tutte e 2 le versioni Copia cache

Vidiff: Translating videos via multi-modal instructions with diffusion models

Z **ng, Q Dai, Z Zhang, H Zhang, H Hu, Z Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

Diffusion models have achieved significant success in image and video generation. This
motivates a growing interest in video editing tasks, where videos are edited according to …

Salva Cita Citato da 18 Articoli correlati Tutte e 5 le versioni

Garnet: Global-aware multi-view 3d reconstruction network and the cost-performance tradeoff

Z Zhu, L Yang, X Lin, L Yang, Y Liang - Pattern Recognition, 2023 - Elsevier

Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At
present, the mainstream solutions adopt different ways to fusion the features from several …

Salva Cita Citato da 10 Articoli correlati Tutte e 5 le versioni Versione HTML

Umiformer: Mining the correlations between similar tokens for multi-view 3d reconstruction

Z Zhu, L Yang, N Li, C Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In recent years, many video tasks have achieved breakthroughs by utilizing the vision
transformer and establishing spatial-temporal decoupling for feature extraction. Although …