Parameter-efficient fine-tuning for large models: A comprehensive survey
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …
enabling remarkable achievements across various tasks. However, their unprecedented …
FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery
With the rapid development of deep learning, many deep learning-based approaches have
made great achievements in object detection tasks. It is generally known that deep learning …
made great achievements in object detection tasks. It is generally known that deep learning …
Biformer: Vision transformer with bi-level routing attention
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …
range dependency. However, such power comes at a cost: it incurs a huge computation …
Convnext v2: Co-designing and scaling convnets with masked autoencoders
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …
visual recognition has enjoyed rapid modernization and performance boost in the early …
Vision mamba: Efficient visual representation learning with bidirectional state space model
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …
Mamba deep learning model, have shown great potential for long sequence modeling …
Diffusiondet: Diffusion model for object detection
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
Efficientvit: Memory efficient vision transformer with cascaded group attention
Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …
However, their remarkable performance is accompanied by heavy computation costs, which …
Sequential modeling enables scalable learning for large vision models
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …
Model (LVM) without making use of any linguistic data. To do this we define a common …
Detrs with collaborative hybrid assignments training
In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …
Gpt4tools: Teaching large language model to use tools via self-instruction
This paper aims to efficiently enable Large Language Models (LLMs) to use multi-modal
tools. The advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great …
tools. The advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great …