الباحث العلمي من Google

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023‏ - peerj.com‏

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …‏

حفظ اقتباس تم اقتباسها في عدد: 218 مقالات ذات صلة الإصدارات الـ 8كلها نسخة مخزَّنة مؤقتًا

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends‏

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022‏ - nowpublishers.com‏

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …‏

حفظ اقتباس تم اقتباسها في عدد: 197 مقالات ذات صلة الإصدارات الـ 7كلها بحث عن المكتبات إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Cambrian-1: A fully open, vision-centric exploration of multimodal llms‏

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …‏

حفظ اقتباس تم اقتباسها في عدد: 172 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] thecvf.com

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts‏

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021‏ - openaccess.thecvf.com‏

The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …‏

حفظ اقتباس تم اقتباسها في عدد: 1078 مقالات ذات صلة الإصدارات الـ 9كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Symbolic knowledge distillation: from general language models to commonsense models‏

P West, C Bhagavatula, J Hessel, JD Hwang… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

The common practice for training commonsense models has gone from-human-to-corpus-to-
machine: humans author commonsense knowledge graphs in order to train commonsense …‏

حفظ اقتباس تم اقتباسها في عدد: 309 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]

[PDF] thecvf.com

From images to textual prompts: Zero-shot visual question answering with frozen large language models‏

J Guo, J Li, D Li, AMH Tiong, B Li… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …‏

حفظ اقتباس تم اقتباسها في عدد: 139 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] thecvf.com

Unbiased scene graph generation from biased training‏

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020‏ - openaccess.thecvf.com‏

Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …‏

حفظ اقتباس تم اقتباسها في عدد: 817 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks‏

J Lu, D Batra, D Parikh, S Lee - Advances in neural …, 2019‏ - proceedings.neurips.cc‏

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-
agnostic joint representations of image content and natural language. We extend the …‏

حفظ اقتباس تم اقتباسها في عدد: 4172 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]

[PDF] aaai.org

Unified vision-language pre-training for image captioning and vqa‏

L Zhou, H Palangi, L Zhang, H Hu, J Corso… - Proceedings of the AAAI …, 2020‏ - ojs.aaai.org‏

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is
unified in that (1) it can be fine-tuned for either vision-language generation (eg, image …‏

حفظ اقتباس تم اقتباسها في عدد: 1042 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

Wilds: A benchmark of in-the-wild distribution shifts‏

PW Koh, S Sagawa, H Marklund… - International …, 2021‏ - proceedings.mlr.press‏

Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …‏

حفظ اقتباس تم اقتباسها في عدد: 1532 مقالات ذات صلة الإصدارات الـ 13كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Don't just assume; look and answer: Overcoming priors for visual question answering

The multi-modal fusion in visual question answering: a review of attention mechanisms‏

Vision-language pre-training: Basics, recent advances, and future trends‏

Cambrian-1: A fully open, vision-centric exploration of multimodal llms‏

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts‏

Symbolic knowledge distillation: from general language models to commonsense models‏

From images to textual prompts: Zero-shot visual question answering with frozen large language models‏

Unbiased scene graph generation from biased training‏

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks‏

Unified vision-language pre-training for image captioning and vqa‏

Wilds: A benchmark of in-the-wild distribution shifts‏