- Academic Search

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

H Fei, S Wu, W Ji, H Zhang, M Zhang, ML Lee… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing research of video understanding still struggles to achieve in-depth comprehension
and reasoning in complex videos, primarily due to the under-exploration of two key …

Opslaan Citeren Geciteerd door 65 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dysen-vdm: Empowering dynamics-aware text-to-video diffusion with llms

H Fei, S Wu, W Ji, H Zhang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Text-to-video (T2V) synthesis has gained increasing attention in the community in
which the recently emerged diffusion models (DMs) have promisingly shown stronger …

Opslaan Citeren Geciteerd door 42 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dreamlip: Language-image pre-training with long captions

K Zheng, Y Zhang, W Wu, F Lu, S Ma, X **… - … on Computer Vision, 2024 - Springer

Abstract Language-image pre-training largely relies on how precisely and thoroughly a text
describes its paired image. In practice, however, the contents of an image can be so rich that …

Opslaan Citeren Geciteerd door 22 Verwante artikelen Alle 8 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vitron: A unified pixel-level vision llm for understanding, generating, segmenting, editing

H Fei, S Wu, H Zhang, TS Chua, S Yan - arxiv preprint arxiv:2412.19806, 2024 - arxiv.org

Recent developments of vision large language models (LLMs) have seen remarkable
progress, yet still encounter challenges towards multimodal generalists, such as coarse …

Opslaan Citeren Geciteerd door 27 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis

M Luo, H Fei, B Li, S Wu, Q Liu, S Poria… - Proceedings of the …, 2024 - dl.acm.org

While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and
advancement, there are still gaps in defining a more holistic research target seamlessly …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 12 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Who evaluates the evaluations? objectively scoring text-to-image prompt coherence metrics with t2iscorescore (ts2)

M Saxon, F Jahara, M Khoshnoodi, Y Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

With advances in the quality of text-to-image (T2I) models has come interest in
benchmarking their prompt faithfulness--the semantic coherence of generated images to the …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nus-emo at semeval-2024 task 3: Instruction-tuning llm for multimodal emotion-cause analysis in conversations

M Luo, H Zhang, S Wu, B Li, H Han, H Fei - arxiv preprint arxiv …, 2024 - arxiv.org

This paper describes the architecture of our system developed for Task 3 of SemEval-2024:
Multimodal Emotion-Cause Analysis in Conversations. Our project targets the challenges of …

Opslaan Citeren Geciteerd door 5 Verwante artikelen Alle 5 versies HTML-versie

Modeling implicit variable and latent structure for aspect-based sentiment quadruple extraction

Y Nie, J Fu, Y Zhang, C Li - Neurocomputing, 2024 - Elsevier

The realm of aspect-based sentiment analysis (ABSA), which delves into the nuanced
sentiment expressions individuals hold towards specific services or products, has …

Opslaan Citeren Geciteerd door 2 Verwante artikelen Alle 2 versies

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Multimodal emotion-cause pair extraction with holistic interaction and label constraint

B Li, H Fei, F Li, T Chua, D Ji - ACM Transactions on Multimedia …, 2024 - dl.acm.org

The multimodal emotion-cause pair extraction (MECPE) task aims to detect the emotions,
causes, and emotion-cause pairs from multimodal conversations. Existing methods for this …

Opslaan Citeren Geciteerd door 1 Verwante artikelen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SpeechEE: A Novel Benchmark for Speech Event Extraction

B Wang, M Zhang, H Fei, Y Zhao, B Li, S Wu… - Proceedings of the …, 2024 - dl.acm.org

Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …

Opslaan Citeren Geciteerd door 1 Verwante artikelen Alle 5 versies

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

Video-of-thought: Step-by-step video reasoning from perception to cognition

Dysen-vdm: Empowering dynamics-aware text-to-video diffusion with llms

Dreamlip: Language-image pre-training with long captions

Vitron: A unified pixel-level vision llm for understanding, generating, segmenting, editing

Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis

Who evaluates the evaluations? objectively scoring text-to-image prompt coherence metrics with t2iscorescore (ts2)

Nus-emo at semeval-2024 task 3: Instruction-tuning llm for multimodal emotion-cause analysis in conversations

Modeling implicit variable and latent structure for aspect-based sentiment quadruple extraction

Multimodal emotion-cause pair extraction with holistic interaction and label constraint

SpeechEE: A Novel Benchmark for Speech Event Extraction