- Academic Search

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Y Wang, X Liu, Y Li, M Chen, C **ao - European Conference on Computer …, 2024 - Springer

With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …

Opslaan Citeren Geciteerd door 39 Verwante artikelen Alle 6 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Opslaan Citeren Geciteerd door 62 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gemini pro defeated by gpt-4v: Evidence from education

GG Lee, E Latif, L Shi, X Zhai - arxiv preprint arxiv:2401.08660, 2023 - arxiv.org

This study compared the classification performance of Gemini Pro and GPT-4V in
educational settings. Employing visual question answering (VQA) techniques, the study …

Opslaan Citeren Geciteerd door 34 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

Opslaan Citeren Geciteerd door 30 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videovista: A versatile benchmark for video understanding and reasoning

Y Li, X Chen, B Hu, L Wang, H Shi, M Zhang - arxiv preprint arxiv …, 2024 - arxiv.org

Despite significant breakthroughs in video analysis driven by the rapid development of large
multimodal models (LMMs), there remains a lack of a versatile evaluation benchmark to …

Opslaan Citeren Geciteerd door 17 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

D Zhang, J Yang, H Lyu, Z **, Y Yao, M Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

When exploring the development of Artificial General Intelligence (AGI), a critical task for
these models involves interpreting and processing information from multiple image inputs …

Opslaan Citeren Geciteerd door 29 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual-roleplay: Universal jailbreak attack on multimodal large language models via role-playing image character

S Ma, W Luo, Y Wang, X Liu - arxiv preprint arxiv:2405.20773, 2024 - arxiv.org

With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it …

Opslaan Citeren Geciteerd door 20 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Fakingrecipe: Detecting fake news on short video platforms from the perspective of creative process

Y Bu, Q Sheng, J Cao, P Qi, D Wang, J Li - Proceedings of the 32nd …, 2024 - dl.acm.org

As short-form video-sharing platforms become a significant channel for news consumption,
fake news in short videos has emerged as a serious threat in the online information …

Opslaan Citeren Geciteerd door 5 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition

G Dai, X Shu, W Wu, R Yan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown
impressive performance in various visual recognition tasks. This advancement paves the …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Machine-generated text localization

Z Zhang, W Qin, BA Plummer - arxiv preprint arxiv:2402.11744, 2024 - arxiv.org

Machine-Generated Text (MGT) detection aims to identify a piece of text as machine or
human written. Prior work has primarily formulated MGT detection as a binary classification …

Opslaan Citeren Geciteerd door 6 Verwante artikelen Alle 4 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Gpt-4v (ision) as a social media analysis engine

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Video understanding with large language models: A survey

Gemini pro defeated by gpt-4v: Evidence from education

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

Videovista: A versatile benchmark for video understanding and reasoning

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

Visual-roleplay: Universal jailbreak attack on multimodal large language models via role-playing image character

Fakingrecipe: Detecting fake news on short video platforms from the perspective of creative process

GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition

Machine-generated text localization