PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

J Chen, C Ge, E **e, Y Wu, L Yao, X Ren… - … on Computer Vision, 2024 - Springer
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …

Tore: Token reduction for efficient human mesh recovery with transformer

Z Dou, Q Wu, C Lin, Z Cao, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies
for Transformer-based Human Mesh Recovery from monocular images. Current SOTA …

Hrvda: High-resolution visual document assistant

C Liu, K Yin, H Cao, X Jiang, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Leveraging vast training data multimodal large language models (MLLMs) have
demonstrated formidable general visual comprehension capabilities and achieved …

Dynamic token pruning in plain vision transformers for semantic segmentation

Q Tang, B Zhang, J Liu, F Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision transformers have achieved leading performance on various visual tasks yet still
suffer from high computational complexity. The situation deteriorates in dense prediction …

Sparsity in transformers: A systematic literature review

M Farina, U Ahmad, A Taha, H Younes, Y Mesbah… - Neurocomputing, 2024 - Elsevier
Transformers have become the state-of-the-art architectures for various tasks in Natural
Language Processing (NLP) and Computer Vision (CV); however, their space and …

Restore-rwkv: Efficient and effective medical image restoration with rwkv

Z Yang, H Zhang, D Zhao, B Wei, Y Xu - arxiv preprint arxiv:2407.11087, 2024 - arxiv.org
Transformers have revolutionized medical image restoration, but the quadratic complexity
still poses limitations for their application to high-resolution medical images. The recent …

Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference

T Liu, X Liu, S Huang, L Shi, Z Xu, Y **n, Q Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …

A finite element-convolutional neural network model (FE-CNN) for stress field analysis around arbitrary inclusions

M Rezasefat, JD Hogan - Machine Learning: Science and …, 2023 - iopscience.iop.org
This study presents a data-driven finite element-machine learning surrogate model for
predicting the end-to-end full-field stress distribution and stress concentration around an …

Agglomerative Token Clustering

JB Haurum, S Escalera, GW Taylor… - European Conference on …, 2024 - Springer
Abstract We present Agglomerative Token Clustering (ATC), a novel token merging method
that consistently outperforms previous token merging and pruning methods across image …

US-Net: U-shaped network with Convolutional Attention Mechanism for ultrasound medical images

X **e, P Liu, Y Lang, Z Guo, Z Yang, Y Zhao - Computers & Graphics, 2024 - Elsevier
Ultrasound imaging, characterized by low contrast, high noise, and interference from
surrounding tissues, poses significant challenges in lesion segmentation. To tackle these …