From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
MobileNetV4: Universal Models for the Mobile Ecosystem
We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …
universally-efficient architecture designs for mobile devices. We introduce the Universal …
Repvit: Revisiting mobile cnn from vit perspective
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
Faster segment anything: Towards lightweight sam for mobile applications
Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out
the object of interest from its background. Since Meta research team released the SA project …
the object of interest from its background. Since Meta research team released the SA project …
U-kan makes strong backbone for medical image segmentation and generation
U-Net has become a cornerstone in various visual applications such as image segmentation
and diffusion probability models. While numerous innovative designs and improvements …
and diffusion probability models. While numerous innovative designs and improvements …
Efficientsam: Leveraged masked image pretraining for efficient segment anything
Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …
vision applications. A key component that drives the impressive performance for zero-shot …
A survey on transformer compression
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …
intelligence, particularly within the realms of natural language processing (NLP) and …
Depth pro: Sharp monocular metric depth in less than a second
A Bochkovskii, A Delaunoy, H Germain… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a foundation model for zero-shot metric monocular depth estimation. Our model,
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …
Mobileclip: Fast image-text models through multi-modal reinforced training
Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …
excellent zero-shot performance and improved robustness on a wide range of downstream …
Security of target recognition for UAV forestry remote sensing based on multi-source data fusion transformer framework
Abstract Unmanned Aerial Vehicle (UAV) remote sensing object recognition plays a vital
role in a variety of sectors including military, agriculture, forestry, and construction. Accurate …
role in a variety of sectors including military, agriculture, forestry, and construction. Accurate …