- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Enregistrer Citer Cité 611 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Enregistrer Citer Cité 135 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Enregistrer Citer Cité 1596 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Objaverse: A universe of annotated 3d objects

M Deitke, D Schwenk, J Salvador… - Proceedings of the …, 2023 - openaccess.thecvf.com

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …

Enregistrer Citer Cité 753 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Reproducible scaling laws for contrastive language-image learning

M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …

Enregistrer Citer Cité 699 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2023 - proceedings.neurips.cc

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Enregistrer Citer Cité 127 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Enregistrer Citer Cité 803 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Towards generalist biomedical AI

T Tu, S Azizi, D Driess, M Schaekermann, M Amin… - NEJM AI, 2024 - ai.nejm.org

Background Medicine is inherently multimodal, requiring the simultaneous interpretation
and integration of insights between many data modalities spanning text, imaging, genomics …

Enregistrer Citer Cité 323 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis

A Sauer, T Karras, S Laine… - … on machine learning, 2023 - proceedings.mlr.press

Text-to-image synthesis has recently seen significant progress thanks to large pretrained
language models, large-scale training data, and the introduction of scalable model families …

Enregistrer Citer Cité 228 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Improving clip training with language rewrites

L Fan, D Krishnan, P Isola… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective
and scalable methods for training transferable vision models using paired image and text …

Enregistrer Citer Cité 157 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Objaverse: A universe of annotated 3d objects

Reproducible scaling laws for contrastive language-image learning

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Towards generalist biomedical AI

Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis

Improving clip training with language rewrites