Google Academic

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Salvați Citați Citat de 279 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Recurrent neural networks: A comprehensive review of architectures, variants, and applications

ID Mienye, TG Swart, G Obaido - Information, 2024 - mdpi.com

Recurrent neural networks (RNNs) have significantly advanced the field of machine learning
(ML) by enabling the effective processing of sequential data. This paper provides a …

Salvați Citați Citat de 83 ori Articole cu conținut similar Toate cele 6 versiuni În cache

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual autoregressive modeling: Scalable image generation via next-scale prediction

K Tian, Y Jiang, Z Yuan, B Peng… - Advances in neural …, 2025 - proceedings.neurips.cc

Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …

Salvați Citați Citat de 168 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

An image is worth 32 tokens for reconstruction and generation

Q Yu, M Weber, X Deng, X Shen… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent advancements in generative models have highlighted the crucial role of image
tokenization in the efficient synthesis of high-resolution images. Tokenization, which …

Salvați Citați Citat de 55 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive model beats diffusion: Llama for scalable image generation

P Sun, Y Jiang, S Chen, S Zhang, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce LlamaGen, a new family of image generation models that apply original``next-
token prediction''paradigm of large language models to visual generation domain. It is an …

Salvați Citați Citat de 141 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omg-seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

Salvați Citați Citat de 49 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

When do we not need larger vision models?

B Shi, Z Wu, M Mao, X Wang, T Darrell - European Conference on …, 2024 - Springer

Scaling up the size of vision models has been the de facto standard to obtain more powerful
visual representations. In this work, we discuss the point beyond which larger vision models …

Salvați Citați Citat de 45 ori Articole cu conținut similar Toate cele 5 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Salvați Citați Citat de 46 ori Articole cu conținut similar Toate cele 5 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

BRAVE: Broadening the visual encoding of vision-language models

OF Kar, A Tonioni, P Poklukar, A Kulshrestha… - … on Computer Vision, 2024 - Springer

Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …

Salvați Citați Citat de 32 ori Articole cu conținut similar Toate cele 8 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scalable pre-training of large autoregressive image models

A El-Nouby, M Klein, S Zhai, MA Bautista… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces AIM, a collection of vision models pre-trained with an autoregressive
objective. These models are inspired by their textual counterparts, ie, Large Language …

Salvați Citați Citat de 56 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Sequential modeling enables scalable learning for large vision models

Parameter-efficient fine-tuning for large models: A comprehensive survey

[HTML][HTML] Recurrent neural networks: A comprehensive review of architectures, variants, and applications

Visual autoregressive modeling: Scalable image generation via next-scale prediction

An image is worth 32 tokens for reconstruction and generation

Autoregressive model beats diffusion: Llama for scalable image generation

Omg-seg: Is one model good enough for all segmentation?

When do we not need larger vision models?

Shapellm: Universal 3d object understanding for embodied interaction

BRAVE: Broadening the visual encoding of vision-language models

Scalable pre-training of large autoregressive image models