Optimal transport aggregation for visual place recognition
S Izquierdo, J Civera - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract The task of Visual Place Recognition (VPR) aims to match a query image against
references from an extensive database of images from different places relying solely on …
references from an extensive database of images from different places relying solely on …
Matte anything: Interactive natural image matting with segment anything model
Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the
trimap guidance. However, the production of trimap often requires significant labor, which …
trimap guidance. However, the production of trimap often requires significant labor, which …
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT)
architectures has emerged as a groundbreaking approach, pushing the boundaries of …
architectures has emerged as a groundbreaking approach, pushing the boundaries of …
Diffusion for natural image matting
Existing natural image matting algorithms inevitably have flaws in their predictions on
difficult cases, and their one-step prediction manner cannot further correct these errors. In …
difficult cases, and their one-step prediction manner cannot further correct these errors. In …
Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey
Abstract The hybrid of Convolutional Neural Network (CNN) and Vision Transformer (ViT)
architecture has emerged as a groundbreaking approach, pushing the boundaries of …
architecture has emerged as a groundbreaking approach, pushing the boundaries of …
Endodac: Efficient adapting foundation model for self-supervised depth estimation from any endoscopic camera
Depth estimation plays a crucial role in various tasks within endoscopic surgery, including
navigation, surface reconstruction, and augmented reality visualization. Despite the …
navigation, surface reconstruction, and augmented reality visualization. Despite the …
Transparent image layer diffusion using latent transparency
L Zhang, M Agrawala - arxiv preprint arxiv:2402.17113, 2024 - arxiv.org
We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion
models to generate transparent images. The method allows generation of single transparent …
models to generate transparent images. The method allows generation of single transparent …
SparseDC: Depth Completion from sparse and non-uniform inputs
We propose SparseDC, a model for Depth Completion from Sparse and non-uniform inputs.
Unlike previous methods focusing on completing fixed distributions on benchmark datasets …
Unlike previous methods focusing on completing fixed distributions on benchmark datasets …
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
PD Tudosiu, Y Yang, S Zhang, F Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image generation has achieved astonishing results yet precise spatial controllability
and prompt fidelity remain highly challenging. This limitation is typically addressed through …
and prompt fidelity remain highly challenging. This limitation is typically addressed through …
Unifying Automatic and Interactive Matting with Pretrained ViTs
Z Ye, W Liu, H Guo, Y Liang, C Hong… - Proceedings of the …, 2024 - openaccess.thecvf.com
Automatic and interactive matting largely improve image matting by respectively alleviating
the need for auxiliary input and enabling object selection. Due to different settings on …
the need for auxiliary input and enabling object selection. Due to different settings on …