Aesexpert: Towards multi-modality foundation model for image aesthetics perception

Y Huang, X Sheng, Z Yang, Q Yuan, Z Duan… - Proceedings of the …, 2024 - dl.acm.org
The highly abstract nature of image aesthetics perception (IAP) poses a significant
challenge for current multimodal large language models (MLLMs). The lack of human …

Aesbench: An expert benchmark for multimodal large language models on image aesthetics perception

Y Huang, Q Yuan, X Sheng, Z Yang, H Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
With collective endeavors, multimodal large language models (MLLMs) are undergoing a
flourishing development. However, their performances on image aesthetics perception …

The Call for Socially Aware Language Technologies

D Yang, D Hovy, D Jurgens, B Plank - arxiv preprint arxiv:2405.02411, 2024 - arxiv.org
Language technologies have made enormous progress, especially with the introduction of
large language models (LLMs). On traditional tasks such as machine translation and …

Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis

L **ao, R Mao, X Zhang, L He… - Findings of the …, 2024 - aclanthology.org
Prevailing research concentrates on superficial features or descriptions of images, revealing
a significant gap in the systematic exploration of their connotative and aesthetic attributes …

Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

CC Liu, I Gurevych, A Korhonen - arxiv preprint arxiv:2406.03930, 2024 - arxiv.org
The surge of interest in culturally aware and adapted Natural Language Processing (NLP)
has inspired much recent research. However, the lack of common understanding of the …

Can Large Multimodal Models Uncover Deep Semantics Behind Images?

Y Yang, Z Li, Q Dong, H **a, Z Sui - arxiv preprint arxiv:2402.11281, 2024 - arxiv.org
Understanding the deep semantics of images is essential in the era dominated by social
media. However, current research works primarily on the superficial description of images …

[PDF][PDF] Are we at a Multimodal Turn?

M Wevers, T Smits - 2024 - indico.cern.ch
• Just as OCR did for text, multimodal AI provides bottom-up access to visual culture.•
Overview allows us to identify patterns of presence and absence: who and what is visible …