A survey on multi-modal summarization

A Jangra, S Mukherjee, A Jatowt, S Saha… - ACM Computing …, 2023 - dl.acm.org
The new era of technology has brought us to the point where it is convenient for people to
share their opinions over an abundance of platforms. These platforms have a provision for …

A general survey on attention mechanisms in deep learning

G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org
Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …

Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction

X Chen, N Zhang, L Li, Y Yao, S Deng, C Tan… - arxiv preprint arxiv …, 2022 - arxiv.org
Multimodal named entity recognition and relation extraction (MNER and MRE) is a
fundamental and crucial branch in information extraction. However, existing approaches for …

RpBERT: a text-image relation propagation-based BERT model for multimodal NER

L Sun, J Wang, K Zhang, Y Su, F Weng - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Recently multimodal named entity recognition (MNER) has utilized images to improve the
accuracy of NER in tweets. However, most of the multimodal methods use attention …

Mner-qg: An end-to-end mrc framework for multimodal named entity recognition with query grounding

M Jia, L Shen, X Shen, L Liao, M Chen, X He… - Proceedings of the …, 2023 - ojs.aaai.org
Multimodal named entity recognition (MNER) is a critical step in information extraction,
which aims to detect entity spans and classify them to corresponding entity types given a …

Query prior matters: A MRC framework for multimodal named entity recognition

M Jia, X Shen, L Shen, J Pang, L Liao, Y Song… - Proceedings of the 30th …, 2022 - dl.acm.org
Multimodal named entity recognition (MNER) is a vision-language task where the system is
required to detect entity spans and corresponding entity types given a sentence-image pair …

Multimodal aspect-based sentiment analysis: a survey of tasks, methods, challenges and future directions

T Zhao, L Meng, D Song - Information Fusion, 2024 - Elsevier
With the development of social media, users increasingly tend to express their sentiments
(broadly including sentiment polarities, emotions and sarcasm, etc.) associated with fine …

Multimodal named entity recognition with image attributes and image knowledge

D Chen, Z Li, B Gu, Z Chen - … , DASFAA 2021, Taipei, Taiwan, April 11–14 …, 2021 - Springer
Multimodal named entity extraction is an emerging task which uses both textual and visual
information to detect named entities and identify their entity types. The existing efforts are …

A large-scale Chinese multimodal NER dataset with speech clues

D Sui, Z Tian, Y Chen, K Liu, J Zhao - Proceedings of the 59th …, 2021 - aclanthology.org
In this paper, we aim to explore an uncharted territory, which is Chinese multimodal named
entity recognition (NER) with both textual and acoustic contents. To achieve this, we …

Umie: Unified multimodal information extraction with instruction tuning

L Sun, K Zhang, Q Li, R Lou - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Multimodal information extraction (MIE) gains significant attention as the popularity of
multimedia content increases. However, current MIE methods often resort to using task …