Alleviating hallucination in large vision-language models with active retrieval augmentation

X Qu, Q Chen, W Wei, J Sun, J Dong - arxiv preprint arxiv:2408.00555, 2024 - arxiv.org
Despite the remarkable ability of large vision-language models (LVLMs) in image
comprehension, these models frequently generate plausible yet factually incorrect …

Decomposed prototype learning for few-shot scene graph generation

X Li, J **ao, G Chen, Y Feng, Y Yang, AA Liu… - ACM Transactions on …, 2024 - dl.acm.org
Today's scene graph generation (SGG) models typically require abundant manual
annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world …

LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis

X Cui, Q Sun, M Wang, L Li, W Zhou, H Li - ACM Transactions on …, 2025 - dl.acm.org
In complex scene synthesis, the effective representation of layouts is paramount. This paper
introduces LayoutEnc, an advanced approach specifically designed to enhance layout …