Vila-u: a unified foundation model integrating visual understanding and generation Y Wu, Z Zhang, J Chen, H Tang, D Li, Y Fang, L Zhu, E Xie, H Yin, L Yi, ... arXiv preprint arXiv:2409.04429, 2024 | 34 | 2024 |
Hart: Efficient visual generation with hybrid autoregressive transformer H Tang, Y Wu, S Yang, E Xie, J Chen, J Chen, Z Zhang, H Cai, Y Lu, ... arXiv preprint arXiv:2410.10812, 2024 | 16 | 2024 |