Instruction-guided scene text recognition

Y Du, Z Chen, Y Su, C Jia… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Multi-modal models have shown appealing performance in visual recognition tasks, as free-
form text-guided training evokes the ability to understand fine-grained visual content …

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - ar** via Semantic and Geometric Guided Segmentation
H Li, W Mao, W Deng, C Meng, R Zhang, F Jia… - ar**, which involves gras** specific parts of objects based on their
functions, is crucial for develo** advanced robotic systems capable of performing complex …