- Academic Search

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Speichern Zitieren Zitiert von: 77 Ähnliche Artikel

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Speichern Zitieren Zitiert von: 197 Ähnliche Artikel Alle 7 Versionen Bibliothekssuche HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Vipergpt: Visual inference via python execution for reasoning

D Surís, S Menon, C Vondrick - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …

Speichern Zitieren Zitiert von: 416 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Semantic communications for future internet: Fundamentals, applications, and challenges

W Yang, H Du, ZQ Liew, WYB Lim… - … Surveys & Tutorials, 2022 - ieeexplore.ieee.org

With the increasing demand for intelligent services, the sixth-generation (6G) wireless
networks will shift from a traditional architecture that focuses solely on a high transmission …

Speichern Zitieren Zitiert von: 332 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] thecvf.com

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

Speichern Zitieren Zitiert von: 316 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Speichern Zitieren Zitiert von: 4714 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Mdetr-modulated detection for end-to-end multi-modal understanding

A Kamath, M Singh, Y LeCun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of
interest from the image. However, this crucial module is typically used as a black box …

Speichern Zitieren Zitiert von: 931 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Speichern Zitieren Zitiert von: 227 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Speichern Zitieren Zitiert von: 151 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Large-scale adversarial training for vision-and-language representation learning

Z Gan, YC Chen, L Li, C Zhu… - Advances in Neural …, 2020 - proceedings.neurips.cc

We present VILLA, the first known effort on large-scale adversarial training for vision-and-
language (V+ L) representation learning. VILLA consists of two training stages:(i) task …

Speichern Zitieren Zitiert von: 554 Ähnliche Artikel Alle 8 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Compositional attention networks for machine reasoning

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Vision-language pre-training: Basics, recent advances, and future trends

Vipergpt: Visual inference via python execution for reasoning

Semantic communications for future internet: Fundamentals, applications, and challenges

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

On the opportunities and risks of foundation models

Mdetr-modulated detection for end-to-end multi-modal understanding

Ai alignment: A comprehensive survey

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Large-scale adversarial training for vision-and-language representation learning