Abstractive text summarization: State of the art, challenges, and improvements
Specifically focusing on the landscape of abstractive text summarization, as opposed to
extractive techniques, this survey presents a comprehensive overview, delving into state-of …
extractive techniques, this survey presents a comprehensive overview, delving into state-of …
Summary-oriented vision modeling for multimodal abstractive summarization
Multimodal abstractive summarization (MAS) aims to produce a concise summary given the
multimodal data (text and vision). Existing studies mainly focus on how to effectively use the …
multimodal data (text and vision). Existing studies mainly focus on how to effectively use the …
Recent trends of multimodal affective computing: A survey from NLP perspective
Multimodal affective computing (MAC) has garnered increasing attention due to its broad
applications in analyzing human behaviors and intentions, especially in text-dominated …
applications in analyzing human behaviors and intentions, especially in text-dominated …
Unisa: Unified generative framework for sentiment analysis
Sentiment analysis is a crucial task that aims to understand people's emotional states and
predict emotional categories based on multimodal information. It consists of several …
predict emotional categories based on multimodal information. It consists of several …
Multi-task hierarchical heterogeneous fusion framework for multimodal summarization
With the rise of multimedia content on the internet, Multimodal Summarization has become a
challenging task to help individuals grasp vital information fast. However, previous methods …
challenging task to help individuals grasp vital information fast. However, previous methods …
Exploiting pseudo image captions for multimodal summarization
Cross-modal contrastive learning in vision language pretraining (VLP) faces the challenge
of (partial) false negatives. In this paper, we study this problem from the perspective of …
of (partial) false negatives. In this paper, we study this problem from the perspective of …
DTV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization
Many-to-many multimodal summarization (M $^ 3$ S) task aims to generate summaries in
any language with document inputs in any language and the corresponding image …
any language with document inputs in any language and the corresponding image …
Hyperpelt: Unified parameter-efficient language model tuning for both language and vision-and-language tasks
The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving
various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of …
various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of …
Evaluating and improving factuality in multimodal abstractive summarization
Current metrics for evaluating factuality for abstractive document summarization have
achieved high correlations with human judgment, but they do not account for the vision …
achieved high correlations with human judgment, but they do not account for the vision …
Unimeec: Towards unified multimodal emotion recognition and emotion cause
Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair
extraction (MECPE) have recently garnered significant attention. Emotions are the …
extraction (MECPE) have recently garnered significant attention. Emotions are the …