[HTML][HTML] A survey of robot intelligence with large language models

H Jeong, H Lee, C Kim, S Shin - Applied Sciences, 2024 - mdpi.com
Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …

Segment anything in medical images and videos: Benchmark and deployment

J Ma, S Kim, F Li, M Baharoon, R Asakereh… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in segmentation foundation models have enabled accurate and efficient
segmentation across a wide range of natural images and videos, but their utility to medical …

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

CL Cheang, G Chen, Y **g, T Kong, H Li, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable
robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture …

Large scale foundation models for intelligent manufacturing applications: a survey

H Zhang, SD Semujju, Z Wang, X Lv, K Xu… - Journal of Intelligent …, 2025 - Springer
Although the applications of artificial intelligence especially deep learning have greatly
improved various aspects of intelligent manufacturing, they still face challenges for broader …

Flow as the cross-domain manipulation interface

M Xu, Z Xu, Y Xu, C Chi, G Wetzstein, M Veloso… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-
world manipulation skills without the need of real-world robot training data. The key idea …

Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation

X **ong, Z Wu, S Tan, W Li, F Tang, Y Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Image segmentation plays an important role in vision understanding. Recently, the emerging
vision foundation models continuously achieved superior performance on various tasks …

Sam2-adapter: Evaluating & adapting segment anything 2 in downstream tasks: Camouflage, shadow, medical image segmentation, and more

T Chen, A Lu, L Zhu, C Ding, C Yu, D Ji, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of large models, also known as foundation models, has significantly transformed
the AI research landscape, with models like Segment Anything (SAM) achieving notable …

Unimatch v2: Pushing the limit of semi-supervised semantic segmentation

L Yang, Z Zhao, H Zhao - IEEE Transactions on Pattern …, 2025 - ieeexplore.ieee.org
Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from
cheap unlabeled images to enhance semantic segmentation capability. Among recent …

PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images

N Liu, X Xu, Y Su, H Zhang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Segment Anything Model (SAM) is an advanced foundational model for image
segmentation, which is gradually being applied to remote sensing images (RSIs). Due to the …

Evf-sam: Early vision-language fusion for text-prompted segment anything model

Y Zhang, T Cheng, R Hu, L Liu, H Liu, L Ran… - arxiv preprint arxiv …, 2024 - arxiv.org
Segment Anything Model (SAM) has attracted widespread attention for its superior
interactive segmentation capabilities with visual prompts while lacking further exploration of …