- Academic Search

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

保存引用被引用次数：825 相关文章所有 2 个版本 HTML 版

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

保存引用被引用次数：8351 相关文章所有 12 个版本 HTML 版

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

保存引用被引用次数：1596 相关文章所有 4 个版本

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HY Mark Liao - European conference on computer …, 2024 - Springer

Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …

保存引用被引用次数：1411 相关文章所有 3 个版本

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

保存引用被引用次数：1190 相关文章所有 10 个版本 HTML 版

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

保存引用被引用次数：713 相关文章所有 10 个版本 HTML 版

SpectralGPT: Spectral remote sensing foundation model

D Hong, B Zhang, X Li, Y Li, C Li, J Yao… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …

保存引用被引用次数：447 相关文章所有 6 个版本

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

保存引用被引用次数：446 相关文章所有 6 个版本 HTML 版