Foundation models for generalist medical artificial intelligence
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt
Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …
from society. As a result, many individuals have become interested in related resources and …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Yolov9: Learning what you want to learn using programmable gradient information
Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …
prediction as close as possible to the target. Meanwhile, an appropriate neural network …
Run, don't walk: chasing higher FLOPS for faster neural networks
To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
Biformer: Vision transformer with bi-level routing attention
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …
range dependency. However, such power comes at a cost: it incurs a huge computation …
SpectralGPT: Spectral remote sensing foundation model
The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …
revolutionize the field of visual representation learning in a self-supervised manner. While …
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …