From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

MobileNetV4: Universal Models for the Mobile Ecosystem

D Qin, C Leichner, M Delakis, M Fornoni, S Luo… - … on Computer Vision, 2024 - Springer
We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

Faster segment anything: Towards lightweight sam for mobile applications

C Zhang, D Han, Y Qiao, JU Kim, SH Bae… - arxiv preprint arxiv …, 2023 - arxiv.org
Segment anything model (SAM) is a prompt-guided vision foundation model for cutting out
the object of interest from its background. Since Meta research team released the SA project …

U-kan makes strong backbone for medical image segmentation and generation

C Li, X Liu, W Li, C Wang, H Liu, Y Liu, Z Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
U-Net has become a cornerstone in various visual applications such as image segmentation
and diffusion probability models. While numerous innovative designs and improvements …

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y **ong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Depth pro: Sharp monocular metric depth in less than a second

A Bochkovskii, A Delaunoy, H Germain… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a foundation model for zero-shot metric monocular depth estimation. Our model,
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …

Mobileclip: Fast image-text models through multi-modal reinforced training

PKA Vasu, H Pouransari, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …

Security of target recognition for UAV forestry remote sensing based on multi-source data fusion transformer framework

H Feng, Q Li, W Wang, AK Bashir, AK Singh, J Xu… - Information …, 2024 - Elsevier
Abstract Unmanned Aerial Vehicle (UAV) remote sensing object recognition plays a vital
role in a variety of sectors including military, agriculture, forestry, and construction. Accurate …