Google Académico

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - ar** language-image pre-training with frozen image encoders and large language models

J Li, D Li, S Savarese, S Hoi - International conference on …, 2023 - proceedings.mlr.press

The cost of vision-and-language pre-training has become increasingly prohibitive due to
end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and …

Guardar Citar Citado por 4801 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

Guardar Citar Citado por 531 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Objaverse: A universe of annotated 3d objects

M Deitke, D Schwenk, J Salvador… - Proceedings of the …, 2023 - openaccess.thecvf.com

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …

Guardar Citar Citado por 752 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Guardar Citar Citado por 568 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2023 - proceedings.neurips.cc

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Guardar Citar Citado por 127 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …

Guardar Citar Citado por 238 Artículos relacionados Las 26 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Lxmert: Learning cross-modality encoder representations from transformers

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

Objaverse: A universe of annotated 3d objects

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning