A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier
Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

[HTML][HTML] Large language models for human–robot interaction: A review

C Zhang, J Chen, J Li, Y Peng, Z Mao - Biomimetic Intelligence and …, 2023 - Elsevier
The fusion of large language models and robotic systems has introduced a transformative
paradigm in human–robot interaction, offering unparalleled capabilities in natural language …

Autoregressive image generation without vector quantization

T Li, Y Tian, H Li, M Deng, K He - Advances in Neural …, 2025 - proceedings.neurips.cc
Conventional wisdom holds that autoregressive models for image generation are typically
accompanied by vector-quantized tokens. We observe that while a discrete-valued space …

Scalable diffusion models with transformers

W Peebles, S **e - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …

Rt-1: Robotics transformer for real-world control at scale

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

N Ma, M Goldstein, MS Albergo, NM Boffi… - … on Computer Vision, 2024 - Springer
Abstract We present Scalable Interpolant Transformers (SiT), a family of generative models
built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which …

Efficient and explicit modelling of image hierarchies for image restoration

Y Li, Y Fan, X **ang, D Demandolx… - Proceedings of the …, 2023 - openaccess.thecvf.com
The aim of this paper is to propose a mechanism to efficiently and explicitly model image
hierarchies in the global, regional, and local range for image restoration. To achieve that, we …

Latte: Latent diffusion transformer for video generation

X Ma, Y Wang, G Jia, X Chen, Z Liu, YF Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte
first extracts spatio-temporal tokens from input videos and then adopts a series of …

[HTML][HTML] TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

J Chen, J Mei, X Li, Y Lu, Q Yu, Q Wei, X Luo, Y **e… - Medical Image …, 2024 - Elsevier
Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-
Net face limitations in modeling long-range dependencies. To address this, Transformers …

What can transformers learn in-context? a case study of simple function classes

S Garg, D Tsipras, PS Liang… - Advances in Neural …, 2022 - proceedings.neurips.cc
In-context learning is the ability of a model to condition on a prompt sequence consisting of
in-context examples (input-output pairs corresponding to some task) along with a new query …