Google 학술 검색

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

저장 인용 610회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

저장 인용 184회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] thecvf.com

Scaling up gans for text-to-image synthesis

M Kang, JY Zhu, R Zhang, J Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …

저장 인용 533회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

저장 인용 685회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

저장 인용 382회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Scalable diffusion models with transformers

W Peebles, S **e - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …

[Free GPT-4]

[PDF] thecvf.com

Sigmoid loss for language image pre-training

X Zhai, B Mustafa, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …

저장 인용 620회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2023 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

저장 인용 222회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

저장 인용 703회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

저장 인용 315회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Generative pretraining from pixels

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

A comprehensive survey on applications of transformers for deep learning tasks

Scaling up gans for text-to-image synthesis

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Videomae v2: Scaling video masked autoencoders with dual masking

Scalable diffusion models with transformers

Sigmoid loss for language image pre-training

Scaling data-constrained language models

Eva: Exploring the limits of masked visual representation learning at scale

Scaling language-image pre-training via masking