A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Foundation models in smart agriculture: Basics, opportunities, and challenges

J Li, M Xu, L **ang, D Chen, W Zhuang, X Yin… - … and Electronics in …, 2024 - Elsevier
The past decade has witnessed the rapid development and adoption of machine and deep
learning (ML & DL) methodologies in agricultural systems, showcased by great successes in …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Sam-6d: Segment anything model meets zero-shot 6d object pose estimation

J Lin, L Liu, D Lu, K Jia - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D
poses in cluttered scenes presenting significant challenges for model generalizability …

Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model

Y Zhang, T Zhou, S Wang, P Liang, Y Zhang… - … Conference on Medical …, 2023 - Springer
Abstract The Segment Anything Model (SAM) is a recently developed large model for
general-purpose segmentation for computer vision tasks. SAM was trained using 11 million …

From sam to cams: Exploring segment anything model for weakly supervised semantic segmentation

H Kweon, KJ Yoon - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Weakly Supervised Semantic Segmentation (WSSS) aims to learn the concept of
segmentation using image-level class labels. Recent WSSS works have shown promising …

Sora as an agi world model? a complete survey on text-to-video generation

J Cho, FD Puspitasari, S Zheng, J Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
The evolution of video generation from text, starting with animating MNIST numbers to
simulating the physical world with Sora, has progressed at a breakneck speed over the past …

ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation

S Li, J Cao, P Ye, Y Ding, C Tu, T Chen - Neurocomputing, 2025 - Elsevier
Abstract Zero-Shot Anomaly Segmentation (ZSAS) aims to segment anomalies without any
training data related to the test samples. Recently, while foundational models like CLIP and …

Mesam: Multiscale enhanced segment anything model for optical remote sensing images

X Zhou, F Liang, L Chen, H Liu, Q Song… - … on Geoscience and …, 2024 - ieeexplore.ieee.org
Segment anything model (SAM) has been widely applied to various downstream tasks for its
excellent performance and generalization capability. However, SAM exhibits three …

Visual prompting in multimodal large language models: A survey

J Wu, Z Zhang, Y **a, X Li, Z **a, A Chang, T Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) equip pre-trained large-language models
(LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied …