A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Multimodal sentiment analysis: a survey of methods, trends, and challenges

R Das, TD Singh - ACM Computing Surveys, 2023 - dl.acm.org
Sentiment analysis has come long way since it was introduced as a natural language
processing task nearly 20 years ago. Sentiment analysis aims to extract the underlying …

Grit: Faster and better image captioning transformer using dual visual features

VQ Nguyen, M Suganuma, T Okatani - European Conference on Computer …, 2022 - Springer
Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

The design and implementation of xiaoice, an empathetic social chatbot

L Zhou, J Gao, D Li, HY Shum - Computational Linguistics, 2020 - direct.mit.edu
This article describes the development of Microsoft **aoIce, the most popular social chatbot
in the world. **aoIce is uniquely designed as an artifical intelligence companion with an …

Caption anything: Interactive image description with diverse multimodal controls

T Wang, J Zhang, J Fei, H Zheng, Y Tang, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Controllable image captioning is an emerging multimodal topic that aims to describe the
image with natural language following human purpose, $\textit {eg} $, looking at the …

From Eliza to **aoIce: challenges and opportunities with social chatbots

HY Shum, X He, D Li - Frontiers of Information Technology & Electronic …, 2018 - Springer
Conversational systems have come a long way since their inception in the 1960s. After
decades of research and development, we have seen progress from Eliza and Parry in the …

Say as you wish: Fine-grained control of image caption generation with abstract scene graphs

S Chen, Q **, P Wang, Q Wu - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Humans are able to describe image contents with coarse to fine details as they wish.
However, most image captioning models are intention-agnostic which cannot generate …

Deep learning approaches on image captioning: A review

T Ghandi, H Pourreza, H Mahyar - ACM Computing Surveys, 2023 - dl.acm.org
Image captioning is a research area of immense importance, aiming to generate natural
language descriptions for visual content in the form of still images. The advent of deep …

A survey of multimodal sentiment analysis

M Soleymani, D Garcia, B Jou, B Schuller… - Image and Vision …, 2017 - Elsevier
Sentiment analysis aims to automatically uncover the underlying attitude that we hold
towards an entity. The aggregation of these sentiment over a population represents opinion …