الباحث العلمي من Google

A Arora, P Nakov, M Hardalov, SM Sarwar… - ACM Computing …, 2023‏ - dl.acm.org‏

The proliferation of harmful content on online platforms is a major societal problem, which
comes in many different forms, including hate speech, offensive language, bullying and …‏

حفظ اقتباس تم اقتباسها في عدد: 63 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Multimodal pretraining unmasked: A meta-analysis and a unified framework of vision-and-language BERTs‏

E Bugliarello, R Cotterell, N Okazaki… - Transactions of the …, 2021‏ - direct.mit.edu‏

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for
many tasks in computer vision and natural language processing. Recently, a multitude of …‏

حفظ اقتباس تم اقتباسها في عدد: 161 مقالات ذات صلة الإصدارات الـ 9كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Dual scene graph convolutional network for motivation prediction‏

Y Wanyan, X Yang, X Ma, C Xu - ACM Transactions on Multimedia …, 2023‏ - dl.acm.org‏

Humans can easily infer the motivations behind human actions from only visual data by
comprehensively analyzing the complex context information and utilizing abundant life …‏

حفظ اقتباس تم اقتباسها في عدد: 4 مقالات ذات صلة

Achieving Human Parity on Visual Question Answering‏

M Yan, H Xu, C Li, J Tian, B Bi, W Wang, X Xu… - ACM Transactions on …, 2023‏ - dl.acm.org‏

The Visual Question Answering (VQA) task utilizes both visual image and language analysis
to answer a textual question with respect to an image. It has been a popular research topic …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة

Knowledge-integrated Multi-modal Movie Turning Point Identification‏

D Wang, R Xu, L Cheng, Z Wang - ACM Transactions on Multimedia …, 2024‏ - dl.acm.org‏

The rapid development of artificial intelligence provides rich technologies and tools for the
automated understanding of literary works. As a comprehensive carrier of storylines, movies …‏

حفظ اقتباس مقالات ذات صلة الإصدارات الـ 2كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

ERNIE-ViL: Knowledge enhanced vision-language representations through scene graphs

Detecting harmful content on online platforms: what platforms need vs. where research efforts go‏

Multimodal pretraining unmasked: A meta-analysis and a unified framework of vision-and-language BERTs‏

Dual scene graph convolutional network for motivation prediction‏

Achieving Human Parity on Visual Question Answering‏

Knowledge-integrated Multi-modal Movie Turning Point Identification‏