On the hidden mystery of ocr in large multimodal models Y Liu, Z Li, M Huang, B Yang, W Yu, C Li, XC Yin, CL Liu, L Jin, X Bai arXiv preprint arXiv:2305.07895, 2023 | 174 | 2023 |
Swintextspotter: Scene text spotting via better synergy between text detection and text recognition M Huang, Y Liu, Z Peng, C Liu, D Lin, S Zhu, N Yuan, K Ding, L Jin proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 140 | 2022 |
Spts: single-point text spotting D Peng, X Wang, Y Liu, J Zhang, M Huang, S Lai, J Li, S Zhu, D Lin, ... Proceedings of the 30th ACM International Conference on Multimedia, 4272-4281, 2022 | 63 | 2022 |
Spts v2: single-point scene text spotting Y Liu, J Zhang, D Peng, M Huang, X Wang, J Tang, C Huang, D Lin, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 51 | 2023 |
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer M Huang, J Zhang, D Peng, H Lu, C Huang, Y Liu, X Bai, L Jin Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 32 | 2023 |
OCRBench: on the hidden mystery of OCR in large multimodal models Y Liu, Z Li, M Huang, B Yang, W Yu, C Li, XC Yin, CL Liu, L Jin, X Bai Science China Information Sciences 67 (12), 220102, 2024 | 11 | 2024 |
Mini-monkey: Alleviating the semantic sawtooth effect for lightweight mllms via complementary image pyramid M Huang, Y Liu, D Liang, L Jin, X Bai arXiv preprint arXiv:2408.02034, 2024 | 11* | 2024 |
Hierarchical side-tuning for vision transformers W Lin, Z Wu, W Yang, M Huang, J Huang, L Jin arXiv preprint arXiv:2310.05393, 2023 | 10 | 2023 |
Swintextspotter v2: Towards better synergy for scene text spotting M Huang, D Peng, H Li, Z Peng, C Liu, D Lin, Y Liu, X Bai, L Jin arXiv preprint arXiv:2401.07641, 2024 | 3 | 2024 |
DTDT: Highly Accurate Dense Text Line Detection in Historical Documents via Dynamic Transformer H Li, C Liu, J Wang, M Huang, W Zhou, L Jin International Conference on Document Analysis and Recognition, 381-396, 2023 | 2 | 2023 |
Progressive Evolution from Single-Point to Polygon for Scene Text L Deng, M Huang, X Xie, Y Liu, L Jin, X Bai International Conference on Document Analysis and Recognition, 111-128, 2024 | 1 | 2024 |
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization Y Liu, M Huang, H Yan, L Deng, W Wu, H Lu, C Shen, L Jin, X Bai IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 | | 2025 |
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning L Fu, B Yang, Z Kuang, J Song, Y Li, L Zhu, Q Luo, X Wang, H Lu, ... arXiv preprint arXiv:2501.00321, 2024 | | 2024 |
OCRBench: on the hidden mystery of OCR in large multimodal models Y Liu, Z Li, M Huang, B Yang, W Yu, C Li, XC Yin, CL Liu, L Jin, X Bai Science China Information Sciences 67 (12), 220102, 2024 | | 2024 |
Bridging the Gap Between End-to-End and Two-Step Text Spotting M Huang, H Li, Y Liu, X Bai, L Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | | 2024 |