PIXART-: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
generating images at 4K resolution. PixArt-Σ represents a significant advancement over its …
Tore: Token reduction for efficient human mesh recovery with transformer
In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies
for Transformer-based Human Mesh Recovery from monocular images. Current SOTA …
for Transformer-based Human Mesh Recovery from monocular images. Current SOTA …
Hrvda: High-resolution visual document assistant
Leveraging vast training data multimodal large language models (MLLMs) have
demonstrated formidable general visual comprehension capabilities and achieved …
demonstrated formidable general visual comprehension capabilities and achieved …
Dynamic token pruning in plain vision transformers for semantic segmentation
Vision transformers have achieved leading performance on various visual tasks yet still
suffer from high computational complexity. The situation deteriorates in dense prediction …
suffer from high computational complexity. The situation deteriorates in dense prediction …
Sparsity in transformers: A systematic literature review
Transformers have become the state-of-the-art architectures for various tasks in Natural
Language Processing (NLP) and Computer Vision (CV); however, their space and …
Language Processing (NLP) and Computer Vision (CV); however, their space and …
Restore-rwkv: Efficient and effective medical image restoration with rwkv
Transformers have revolutionized medical image restoration, but the quadratic complexity
still poses limitations for their application to high-resolution medical images. The recent …
still poses limitations for their application to high-resolution medical images. The recent …
Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
A finite element-convolutional neural network model (FE-CNN) for stress field analysis around arbitrary inclusions
M Rezasefat, JD Hogan - Machine Learning: Science and …, 2023 - iopscience.iop.org
This study presents a data-driven finite element-machine learning surrogate model for
predicting the end-to-end full-field stress distribution and stress concentration around an …
predicting the end-to-end full-field stress distribution and stress concentration around an …
Agglomerative Token Clustering
Abstract We present Agglomerative Token Clustering (ATC), a novel token merging method
that consistently outperforms previous token merging and pruning methods across image …
that consistently outperforms previous token merging and pruning methods across image …
US-Net: U-shaped network with Convolutional Attention Mechanism for ultrasound medical images
X **e, P Liu, Y Lang, Z Guo, Z Yang, Y Zhao - Computers & Graphics, 2024 - Elsevier
Ultrasound imaging, characterized by low contrast, high noise, and interference from
surrounding tissues, poses significant challenges in lesion segmentation. To tackle these …
surrounding tissues, poses significant challenges in lesion segmentation. To tackle these …