- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Zapisz Cytuj Cytowane przez 75 Powiązane artykuły

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Zapisz Cytuj Cytowane przez 197 Powiązane artykuły Wszystkie wersje 7 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Vipergpt: Visual inference via python execution for reasoning

D Surís, S Menon, C Vondrick - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …

Zapisz Cytuj Cytowane przez 415 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Chameleon: Plug-and-play compositional reasoning with large language models

P Lu, B Peng, H Cheng, M Galley… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have achieved remarkable progress in solving various
natural language processing tasks due to emergent reasoning abilities. However, LLMs …

Zapisz Cytuj Cytowane przez 396 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Visual programming: Compositional visual reasoning without training

T Gupta, A Kembhavi - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

We present VISPROG, a neuro-symbolic approach to solving complex and compositional
visual tasks given natural language instructions. VISPROG avoids the need for any task …

Zapisz Cytuj Cytowane przez 401 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] springer.com

Multiscale feature extraction and fusion of image and text in VQA

S Lu, Y Ding, M Liu, Z Yin, L Yin, W Zheng - International Journal of …, 2023 - Springer

Abstract The Visual Question Answering (VQA) system is the process of finding useful
information from images related to the question to answer the question correctly. It can be …

Zapisz Cytuj Cytowane przez 203 Powiązane artykuły Wszystkie wersje 5

[Free GPT-4]

[PDF] arxiv.org

Selection-inference: Exploiting large language models for interpretable logical reasoning

A Creswell, M Shanahan, I Higgins - arxiv preprint arxiv:2205.09712, 2022 - arxiv.org

Large language models (LLMs) have been shown to be capable of impressive few-shot
generalisation to new tasks. However, they still tend to perform poorly on multi-step logical …

Zapisz Cytuj Cytowane przez 336 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Decomposed prompting: A modular approach for solving complex tasks

T Khot, H Trivedi, M Finlayson, Y Fu… - arxiv preprint arxiv …, 2022 - arxiv.org

Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to
solve various tasks. However, this approach struggles as the task complexity increases or …

Zapisz Cytuj Cytowane przez 368 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Zapisz Cytuj Cytowane przez 4676 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

The all-seeing project v2: Towards general relation comprehension of the open world

W Wang, Y Ren, H Luo, T Li, C Yan, Z Chen… - … on Computer Vision, 2024 - Springer

Abstract We present the All-Seeing Project V2: a new model and dataset designed for
understanding object relations in images. Specifically, we propose the All-Seeing Model V2 …

Zapisz Cytuj Cytowane przez 35 Powiązane artykuły Wszystkie wersje 3

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Neural module networks

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Vision-language pre-training: Basics, recent advances, and future trends

Vipergpt: Visual inference via python execution for reasoning

Chameleon: Plug-and-play compositional reasoning with large language models

Visual programming: Compositional visual reasoning without training

Multiscale feature extraction and fusion of image and text in VQA

Selection-inference: Exploiting large language models for interpretable logical reasoning

Decomposed prompting: A modular approach for solving complex tasks

On the opportunities and risks of foundation models

The all-seeing project v2: Towards general relation comprehension of the open world