- Academic Search

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

保存引用被引用次数：241 相关文章所有 2 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery

X Sun, P Wang, Z Yan, F Xu, R Wang, W Diao… - ISPRS Journal of …, 2022 - Elsevier

With the rapid development of deep learning, many deep learning-based approaches have
made great achievements in object detection tasks. It is generally known that deep learning …

保存引用被引用次数：371 相关文章所有 6 个版本

[Free GPT-4]

[PDF] thecvf.com

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

保存引用被引用次数：699 相关文章所有 10 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

保存引用被引用次数：677 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

保存引用被引用次数：1007 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

保存引用被引用次数：485 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

保存引用被引用次数：344 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

保存引用被引用次数：142 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Detrs with collaborative hybrid assignments training

Z Zong, G Song, Y Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …

保存引用被引用次数：352 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Gpt4tools: Teaching large language model to use tools via self-instruction

R Yang, L Song, Y Li, S Zhao, Y Ge… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper aims to efficiently enable Large Language Models (LLMs) to use multi-modal
tools. The advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great …

保存引用被引用次数：179 相关文章所有 7 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

MMDetection: Open mmlab detection toolbox and benchmark

Parameter-efficient fine-tuning for large models: A comprehensive survey

FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery

Biformer: Vision transformer with bi-level routing attention

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Vision mamba: Efficient visual representation learning with bidirectional state space model

Diffusiondet: Diffusion model for object detection

Efficientvit: Memory efficient vision transformer with cascaded group attention

Sequential modeling enables scalable learning for large vision models

Detrs with collaborative hybrid assignments training

Gpt4tools: Teaching large language model to use tools via self-instruction