Alleviating hallucination in large vision-language models with active retrieval augmentation
Despite the remarkable ability of large vision-language models (LVLMs) in image
comprehension, these models frequently generate plausible yet factually incorrect …
comprehension, these models frequently generate plausible yet factually incorrect …
Decomposed prototype learning for few-shot scene graph generation
Today's scene graph generation (SGG) models typically require abundant manual
annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world …
annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world …
LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis
X Cui, Q Sun, M Wang, L Li, W Zhou, H Li - ACM Transactions on …, 2025 - dl.acm.org
In complex scene synthesis, the effective representation of layouts is paramount. This paper
introduces LayoutEnc, an advanced approach specifically designed to enhance layout …
introduces LayoutEnc, an advanced approach specifically designed to enhance layout …