Optimal transport aggregation for visual place recognition

S Izquierdo, J Civera - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract The task of Visual Place Recognition (VPR) aims to match a query image against
references from an extensive database of images from different places relying solely on …

Matte anything: Interactive natural image matting with segment anything model

J Yao, X Wang, L Ye, W Liu - Image and Vision Computing, 2024 - Elsevier
Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the
trimap guidance. However, the production of trimap often requires significant labor, which …

Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey

H Yunusa, S Qin, AHA Chukkol, AA Yusuf… - arxiv preprint arxiv …, 2024 - arxiv.org
The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT)
architectures has emerged as a groundbreaking approach, pushing the boundaries of …

Diffusion for natural image matting

Y Hu, Y Lin, W Wang, Y Zhao, Y Wei, H Shi - European Conference on …, 2024 - Springer
Existing natural image matting algorithms inevitably have flaws in their predictions on
difficult cases, and their one-step prediction manner cannot further correct these errors. In …

Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey

Y Haruna, S Qin, AHA Chukkol, AA Yusuf, I Bello… - … Applications of Artificial …, 2025 - Elsevier
Abstract The hybrid of Convolutional Neural Network (CNN) and Vision Transformer (ViT)
architecture has emerged as a groundbreaking approach, pushing the boundaries of …

Endodac: Efficient adapting foundation model for self-supervised depth estimation from any endoscopic camera

B Cui, M Islam, L Bai, A Wang, H Ren - International Conference on …, 2024 - Springer
Depth estimation plays a crucial role in various tasks within endoscopic surgery, including
navigation, surface reconstruction, and augmented reality visualization. Despite the …

Transparent image layer diffusion using latent transparency

L Zhang, M Agrawala - arxiv preprint arxiv:2402.17113, 2024 - arxiv.org
We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion
models to generate transparent images. The method allows generation of single transparent …

SparseDC: Depth Completion from sparse and non-uniform inputs

C Long, W Zhang, Z Chen, H Wang, Y Liu, P Tong… - Information …, 2024 - Elsevier
We propose SparseDC, a model for Depth Completion from Sparse and non-uniform inputs.
Unlike previous methods focusing on completing fixed distributions on benchmark datasets …

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

PD Tudosiu, Y Yang, S Zhang, F Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image generation has achieved astonishing results yet precise spatial controllability
and prompt fidelity remain highly challenging. This limitation is typically addressed through …

Unifying Automatic and Interactive Matting with Pretrained ViTs

Z Ye, W Liu, H Guo, Y Liang, C Hong… - Proceedings of the …, 2024 - openaccess.thecvf.com
Automatic and interactive matting largely improve image matting by respectively alleviating
the need for auxiliary input and enabling object selection. Due to different settings on …