Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

[HTML][HTML] Recurrent neural networks: A comprehensive review of architectures, variants, and applications

ID Mienye, TG Swart, G Obaido - Information, 2024 - mdpi.com
Recurrent neural networks (RNNs) have significantly advanced the field of machine learning
(ML) by enabling the effective processing of sequential data. This paper provides a …

Visual autoregressive modeling: Scalable image generation via next-scale prediction

K Tian, Y Jiang, Z Yuan, B Peng… - Advances in neural …, 2025 - proceedings.neurips.cc
Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …

An image is worth 32 tokens for reconstruction and generation

Q Yu, M Weber, X Deng, X Shen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Recent advancements in generative models have highlighted the crucial role of image
tokenization in the efficient synthesis of high-resolution images. Tokenization, which …

Autoregressive model beats diffusion: Llama for scalable image generation

P Sun, Y Jiang, S Chen, S Zhang, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce LlamaGen, a new family of image generation models that apply original``next-
token prediction''paradigm of large language models to visual generation domain. It is an …

Omg-seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

When do we not need larger vision models?

B Shi, Z Wu, M Mao, X Wang, T Darrell - European Conference on …, 2024 - Springer
Scaling up the size of vision models has been the de facto standard to obtain more powerful
visual representations. In this work, we discuss the point beyond which larger vision models …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

BRAVE: Broadening the visual encoding of vision-language models

OF Kar, A Tonioni, P Poklukar, A Kulshrestha… - … on Computer Vision, 2024 - Springer
Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …

Scalable pre-training of large autoregressive image models

A El-Nouby, M Klein, S Zhai, MA Bautista… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces AIM, a collection of vision models pre-trained with an autoregressive
objective. These models are inspired by their textual counterparts, ie, Large Language …