Google Académico

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Guardar Citar Citado por 610 Artigos relacionados Todas as 2 versões

[Free GPT-4]

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Guardar Citar Citado por 77 Artigos relacionados

[Free GPT-4]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Guardar Citar Citado por 840 Artigos relacionados Todas as 7 versões Ver em HTML

[Free GPT-4]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Guardar Citar Citado por 214 Artigos relacionados Todas as 6 versões Pesquisa de bibliotecas Ver em HTML

[Free GPT-4]

[PDF] neurips.cc

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

Guardar Citar Citado por 128 Artigos relacionados Todas as 5 versões Ver em HTML

[Free GPT-4]

[PDF] thecvf.com

Fake it till you make it: Learning transferable representations from synthetic imagenet clones

MB Sarıyıldız, K Alahari, D Larlus… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent image generation models such as Stable Diffusion have exhibited an impressive
ability to generate fairly realistic images starting from a simple text prompt. Could such …

Guardar Citar Citado por 150 Artigos relacionados Todas as 13 versões Ver em HTML

[Free GPT-4]

[PDF] arxiv.org

Masked siamese networks for label-efficient learning

M Assran, M Caron, I Misra, P Bojanowski… - … on Computer Vision, 2022 - Springer

Abstract We propose Masked Siamese Networks (MSN), a self-supervised learning
framework for learning image representations. Our approach matches the representation of …

Guardar Citar Citado por 355 Artigos relacionados Todas as 5 versões

[Free GPT-4]

[PDF] thecvf.com

Versatile diffusion: Text, images and variations all in one diffusion model

X Xu, Z Wang, G Zhang, K Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent advances in diffusion models have set an impressive milestone in many generation
tasks, and trending works such as DALL-E2, Imagen, and Stable Diffusion have attracted …

Guardar Citar Citado por 160 Artigos relacionados Todas as 6 versões Ver em HTML

[Free GPT-4]

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Guardar Citar Citado por 253 Artigos relacionados Todas as 7 versões Ver em HTML

[Free GPT-4]

[PDF] thecvf.com

Aligning bag of regions for open-vocabulary object detection

S Wu, W Zhang, S **, W Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-trained vision-language models (VLMs) learn to align vision and language
representations on large-scale datasets, where each image-text pair usually contains a bag …

Guardar Citar Citado por 114 Artigos relacionados Todas as 5 versões Ver em HTML

Citar

Pesquisa avançada

Guardado em A minha biblioteca

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Imagebind: One embedding space to bind them all

Multimodal foundation models: From specialists to general-purpose assistants

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Fake it till you make it: Learning transferable representations from synthetic imagenet clones

Masked siamese networks for label-efficient learning

Versatile diffusion: Text, images and variations all in one diffusion model

St-adapter: Parameter-efficient image-to-video transfer learning

Aligning bag of regions for open-vocabulary object detection