Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation

L Qu, S Wu, H Fei, L Nie, TS Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

B Li, H Fei, L Liao, Y Zhao, C Teng, TS Chua… - Proceedings of the 31st …, 2023 - dl.acm.org
It has been a hot research topic to enable machines to understand human emotions in
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

Video-of-thought: Step-by-step video reasoning from perception to cognition

H Fei, S Wu, W Ji, H Zhang, M Zhang… - Forty-first International …, 2024 - openreview.net
Existing research of video understanding still struggles to achieve in-depth comprehension
and reasoning in complex videos, primarily due to the under-exploration of two key …

Semi-supervised panoptic narrative grounding

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension

J **ang, M Liu, Q Li, C Qiu, H Hu - Information Processing & Management, 2024 - Elsevier
Chinese diachronic gap is a key issue in classical Chinese machine reading
comprehension (CCMRC). Preceding work on bridging this gap has been mostly restricted …

Contrastive Multi-View Interest Learning for Cross-Domain Sequential Recommendation

T Zang, Y Zhu, R Zhang, C Wang, K Wang… - ACM Transactions on …, 2023 - dl.acm.org
Cross-domain recommendation (CDR), which leverages information collected from other
domains, has been empirically demonstrated to effectively alleviate data sparsity and cold …

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

Event-centric hierarchical hyperbolic graph for multi-hop question answering over knowledge graphs

X Zhu, W Gao, T Li, W Yao, H Deng - Engineering Applications of Artificial …, 2024 - Elsevier
Abstract Question Answering over Knowledge Graphs (KGQA) blends natural language
processing with structured knowledge representation. While much attention of existing …

SpeechEE: A Novel Benchmark for Speech Event Extraction

B Wang, M Zhang, H Fei, Y Zhao, B Li, S Wu… - Proceedings of the …, 2024 - dl.acm.org
Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …