Dino-x: A unified vision model for open-world object detection and understanding
Instruction-guided scene text recognition
Multi-modal models have shown appealing performance in visual recognition tasks, as free-
form text-guided training evokes the ability to understand fine-grained visual content …
form text-guided training evokes the ability to understand fine-grained visual content …
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
H Zhang, D Hong, T Gao, Y Wang, J Shao… - ar** via Semantic and Geometric Guided Segmentation