Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …
possible to generate rich kinds of novel photorealistic images. However, current models still …
Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
It has been a hot research topic to enable machines to understand human emotions in
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …
Constructing holistic spatio-temporal scene graph for video semantic role labeling
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …
Video-of-thought: Step-by-step video reasoning from perception to cognition
Existing research of video understanding still struggles to achieve in-depth comprehension
and reasoning in complex videos, primarily due to the under-exploration of two key …
and reasoning in complex videos, primarily due to the under-exploration of two key …
Semi-supervised panoptic narrative grounding
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …
remains hindered by costly annotations. In this paper, we introduce a novel Semi …
A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension
Chinese diachronic gap is a key issue in classical Chinese machine reading
comprehension (CCMRC). Preceding work on bridging this gap has been mostly restricted …
comprehension (CCMRC). Preceding work on bridging this gap has been mostly restricted …
Contrastive Multi-View Interest Learning for Cross-Domain Sequential Recommendation
Cross-domain recommendation (CDR), which leverages information collected from other
domains, has been empirically demonstrated to effectively alleviate data sparsity and cold …
domains, has been empirically demonstrated to effectively alleviate data sparsity and cold …
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …
demonstrated significant potential. These methods predict pixel-level masks by directly …
Event-centric hierarchical hyperbolic graph for multi-hop question answering over knowledge graphs
X Zhu, W Gao, T Li, W Yao, H Deng - Engineering Applications of Artificial …, 2024 - Elsevier
Abstract Question Answering over Knowledge Graphs (KGQA) blends natural language
processing with structured knowledge representation. While much attention of existing …
processing with structured knowledge representation. While much attention of existing …
SpeechEE: A Novel Benchmark for Speech Event Extraction
Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …
important foundation for the construction of structured knowledge bases. EE from text has …