- Academic Search

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions‏

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024‏ - dl.acm.org‏

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …‏

שמור צטט צוטט על ידי 99 מאמרים בנושא זה

[Free GPT-4]
[DeepSeek]

[PDF] peerj.com

The multi-modal fusion in visual question answering: a review of attention mechanisms‏

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023‏ - peerj.com‏

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …‏

שמור צטט צוטט על ידי 220 מאמרים בנושא זה כל 8 הגרסאות במטמון

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends‏

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022‏ - nowpublishers.com‏

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …‏

שמור צטט צוטט על ידי 198 מאמרים בנושא זה כל 7 הגרסאות חיפוש ספריות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

A metaverse: Taxonomy, components, applications, and open challenges‏

SM Park, YG Kim - IEEE access, 2022‏ - ieeexplore.ieee.org‏

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …‏

שמור צטט צוטט על ידי 1876 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing‏

S Zhang, Y Xu, N Usuyama, J Bagga… - arxiv preprint arxiv …, 2023‏ - researchgate.net‏

Contrastive pretraining on parallel image-text data has attained great success in vision-
language processing (VLP), as exemplified by CLIP and related methods. However, prior …‏

שמור צטט צוטט על ידי 161 מאמרים בנושא זה פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Focal self-attention for local-global interactions in vision transformers‏

J Yang, C Li, P Zhang, X Dai, B **ao, L Yuan… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …‏

שמור צטט צוטט על ידי 521 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling up visual and vision-language representation learning with noisy text supervision‏

C Jia, Y Yang, Y **a, YT Chen… - International …, 2021‏ - proceedings.mlr.press‏

Pre-trained representations are becoming crucial for many NLP and perception tasks. While
representation learning in NLP has transitioned to training on raw text without human …‏

שמור צטט צוטט על ידי 3973 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vitae: Vision transformer advanced by exploring intrinsic inductive bias‏

Y Xu, Q Zhang, J Zhang, D Tao - Advances in neural …, 2021‏ - proceedings.neurips.cc‏

Transformers have shown great potential in various computer vision tasks owing to their
strong capability in modeling long-range dependency using the self-attention mechanism …‏

שמור צטט צוטט על ידי 381 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundations and trends in multimodal machine learning: Principles, challenges, and open questions‏

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022‏ - arxiv.org‏

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …‏

שמור צטט צוטט על ידי 152 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Mental health analysis in social media posts: a survey‏

M Garg - Archives of Computational Methods in Engineering, 2023‏ - Springer‏

The surge in internet use to express personal thoughts and beliefs makes it increasingly
feasible for the social NLP research community to find and validate associations between …‏

שמור צטט צוטט על ידי 94 מאמרים בנושא זה כל 8 הגרסאות

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions‏

The multi-modal fusion in visual question answering: a review of attention mechanisms‏

Vision-language pre-training: Basics, recent advances, and future trends‏

A metaverse: Taxonomy, components, applications, and open challenges‏

[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing‏

Focal self-attention for local-global interactions in vision transformers‏

Scaling up visual and vision-language representation learning with noisy text supervision‏

Vitae: Vision transformer advanced by exploring intrinsic inductive bias‏

Foundations and trends in multimodal machine learning: Principles, challenges, and open questions‏

Mental health analysis in social media posts: a survey‏