Sai3d: Segment any instance in 3d scenes

Y Yin, Y Liu, Y **ao, D Cohen-Or… - Proceedings of the …, 2024 - openaccess.thecvf.com
Advancements in 3D instance segmentation have traditionally been tethered to the
availability of annotated datasets limiting their application to a narrow spectrum of object …

CLIP4STR: a simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

Retrieving multimodal information for augmented generation: A survey

R Zhao, H Chen, W Wang, F Jiao, XL Do, C Qin… - arxiv preprint arxiv …, 2023 - arxiv.org
As Large Language Models (LLMs) become popular, there emerged an important trend of
using multimodality to augment the LLMs' generation ability, which enables LLMs to better …

Guiding image captioning models toward more specific captions

S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning is conventionally formulated as the task of generating captions that match
the conditional distribution of reference image-caption pairs. However, reference captions in …

Fusing pre-trained language models with multimodal prompts through reinforcement learning

Y Yu, J Chung, H Yun, J Hessel… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Language models are capable of commonsense reasoning: while domain-specific
models can learn from explicit knowledge (eg commonsense graphs [6], ethical norms [25]) …

Zero-shot visual relation detection via composite visual cues from large language models

L Li, J **ao, G Chen, J Shao… - Advances in Neural …, 2024 - proceedings.neurips.cc
Pretrained vision-language models, such as CLIP, have demonstrated strong generalization
capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual …