Google Académico

Cross2StrA: Unpaired cross-lingual image captioning with cross-lingual cross-modal structure-pivo...

L Qu, S Wu, H Fei, L Nie, TS Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …

Guardar Citar Citado por 95 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] arxiv.org

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

B Li, H Fei, L Liao, Y Zhao, C Teng, TS Chua… - Proceedings of the 31st …, 2023 - dl.acm.org

It has been a hot research topic to enable machines to understand human emotions in
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …

Guardar Citar Citado por 68 Artículos relacionados Las 5 versiones

[Free GPT-4]

[PDF] acm.org

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org

As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

Guardar Citar Citado por 40 Artículos relacionados Las 5 versiones

[Free GPT-4]

[PDF] openreview.net

Video-of-thought: Step-by-step video reasoning from perception to cognition

H Fei, S Wu, W Ji, H Zhang, M Zhang… - Forty-first International …, 2024 - openreview.net

Existing research of video understanding still struggles to achieve in-depth comprehension
and reasoning in complex videos, primarily due to the under-exploration of two key …

Guardar Citar Citado por 62 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Semi-supervised panoptic narrative grounding

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org

Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

Guardar Citar Citado por 8 Artículos relacionados Las 3 versiones

A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension

J **ang, M Liu, Q Li, C Qiu, H Hu - Information Processing & Management, 2024 - Elsevier

Chinese diachronic gap is a key issue in classical Chinese machine reading
comprehension (CCMRC). Preceding work on bridging this gap has been mostly restricted …

Guardar Citar Citado por 6 Artículos relacionados Las 2 versiones

Contrastive Multi-View Interest Learning for Cross-Domain Sequential Recommendation

T Zang, Y Zhu, R Zhang, C Wang, K Wang… - ACM Transactions on …, 2023 - dl.acm.org

Cross-domain recommendation (CDR), which leverages information collected from other
domains, has been empirically demonstrated to effectively alleviate data sparsity and cold …

Guardar Citar Citado por 7 Artículos relacionados

[Free GPT-4]

[PDF] aaai.org

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

Guardar Citar Citado por 3 Artículos relacionados Versión en HTML

Event-centric hierarchical hyperbolic graph for multi-hop question answering over knowledge graphs

X Zhu, W Gao, T Li, W Yao, H Deng - Engineering Applications of Artificial …, 2024 - Elsevier

Abstract Question Answering over Knowledge Graphs (KGQA) blends natural language
processing with structured knowledge representation. While much attention of existing …

Guardar Citar Citado por 2 Artículos relacionados

[Free GPT-4]

[PDF] arxiv.org

SpeechEE: A Novel Benchmark for Speech Event Extraction

B Wang, M Zhang, H Fei, Y Zhao, B Li, S Wu… - Proceedings of the …, 2024 - dl.acm.org

Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …

Guardar Citar Citado por 1 Artículos relacionados Las 4 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Cross2StrA: Unpaired cross-lingual image captioning with cross-lingual cross-modal structure-pivo...

Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Video-of-thought: Step-by-step video reasoning from perception to cognition

Semi-supervised panoptic narrative grounding

A cross-guidance cross-lingual model on generated parallel corpus for classical Chinese machine reading comprehension

Contrastive Multi-View Interest Learning for Cross-Domain Sequential Recommendation

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

Event-centric hierarchical hyperbolic graph for multi-hop question answering over knowledge graphs

SpeechEE: A Novel Benchmark for Speech Event Extraction