Google Učenjak

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Shrani Navedi Navedeno v 231 virih Sorodni članki Vse različice: 6 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Shrani Navedi Navedeno v 199 virih Sorodni članki Vse različice: 7 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Shrani Navedi Navedeno v 658 virih Sorodni članki Vse različice: 11

[免费ChatGPT] [DeepSeek可用网址] [PDF] springer.com

Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Shrani Navedi Navedeno v 203 virih Sorodni članki Vse različice: 8

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

V Udandarao, A Prabhu, A Ghosh… - Advances in …, 2025 - proceedings.neurips.cc

Web-crawled pretraining datasets underlie the impressive" zero-shot" evaluation
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …

Shrani Navedi Navedeno v 45 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Visual language pretrained multiple instance zero-shot transfer for histopathology images

MY Lu, B Chen, A Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Contrastive visual language pretraining has emerged as a powerful method for either
training new language-aware image encoders or augmenting existing pretrained models …

Shrani Navedi Navedeno v 93 virih Sorodni članki Vse različice: 11 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Hallucination augmented contrastive learning for multimodal large language model

C Jiang, H Xu, M Dong, J Chen, W Ye… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multi-modal large language models (MLLMs) have been shown to efficiently integrate
natural language with visual information to handle multi-modal tasks. However MLLMs still …

Shrani Navedi Navedeno v 76 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Dual memory networks: A versatile adaptation approach for vision-language models

Y Zhang, W Zhu, H Tang, Z Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the emergence of pre-trained vision-language models like CLIP how to adapt them to
various downstream classification tasks has garnered significant attention in recent …

Shrani Navedi Navedeno v 23 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] springer.com

Vlp: A survey on vision-language pre-training

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer

In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Shrani Navedi Navedeno v 221 virih Sorodni članki Vse različice: 8

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Detecting and grounding multi-modal media manipulation

R Shao, T Wu, Z Liu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Misinformation has become a pressing issue. Fake media, in both visual and textual forms,
is widespread on the web. While various deepfake detection and text fake news detection …

Shrani Navedi Navedeno v 71 virih Sorodni članki Vse različice: 7 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Vision-language pre-training with triple contrastive learning

Mm-llms: Recent advances in multimodal large language models

Vision-language pre-training: Basics, recent advances, and future trends

Multimodal learning with transformers: A survey

Large-scale multi-modal pre-trained models: A comprehensive survey

No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

Visual language pretrained multiple instance zero-shot transfer for histopathology images

Hallucination augmented contrastive learning for multimodal large language model

Dual memory networks: A versatile adaptation approach for vision-language models

Vlp: A survey on vision-language pre-training

Detecting and grounding multi-modal media manipulation