- Academic Search

K Mahowald, AA Ivanova, IA Blank, N Kanwisher… - Trends in Cognitive …, 2024 - cell.com

Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …

Speichern Zitieren Zitiert von: 435 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Speichern Zitieren Zitiert von: 75 Ähnliche Artikel

[Free GPT-4]

[PDF] thecvf.com

Vipergpt: Visual inference via python execution for reasoning

D Surís, S Menon, C Vondrick - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …

Speichern Zitieren Zitiert von: 415 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Speichern Zitieren Zitiert von: 197 Ähnliche Artikel Alle 7 Versionen Bibliothekssuche HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Selection-inference: Exploiting large language models for interpretable logical reasoning

A Creswell, M Shanahan, I Higgins - arxiv preprint arxiv:2205.09712, 2022 - arxiv.org

Large language models (LLMs) have been shown to be capable of impressive few-shot
generalisation to new tasks. However, they still tend to perform poorly on multi-step logical …

Speichern Zitieren Zitiert von: 336 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Speichern Zitieren Zitiert von: 4676 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Mdetr-modulated detection for end-to-end multi-modal understanding

A Kamath, M Singh, Y LeCun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of
interest from the image. However, this crucial module is typically used as a black box …

Speichern Zitieren Zitiert von: 930 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Vinvl: Revisiting visual representations in vision-language models

P Zhang, X Li, X Hu, J Yang, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents a detailed study of improving vision features and develops an improved
object detection model for vision language (VL) tasks. Compared to the most widely used …

Speichern Zitieren Zitiert von: 1123 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier

Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

Speichern Zitieren Zitiert von: 274 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer

Large-scale pre-training methods of learning cross-modal representations on image-text
pairs are becoming popular for vision-language tasks. While existing methods simply …

Speichern Zitieren Zitiert von: 2229 Ähnliche Artikel Alle 6 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Learning by abstraction: The neural state machine

Dissociating language and thought in large language models

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Vipergpt: Visual inference via python execution for reasoning

Vision-language pre-training: Basics, recent advances, and future trends

Selection-inference: Exploiting large language models for interpretable logical reasoning

On the opportunities and risks of foundation models

Mdetr-modulated detection for end-to-end multi-modal understanding

Vinvl: Revisiting visual representations in vision-language models

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks