[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

A survey of deep active learning

P Ren, Y **ao, X Chang, PY Huang, Z Li… - ACM computing …, 2021 - dl.acm.org
Active learning (AL) attempts to maximize a model's performance gain while annotating the
fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount …

Should chatgpt be biased? challenges and risks of bias in large language models

E Ferrara - arxiv preprint arxiv:2304.03738, 2023 - arxiv.org
As the capabilities of generative language models continue to advance, the implications of
biases ingrained within these models have garnered increasing attention from researchers …

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

N Ma, M Goldstein, MS Albergo, NM Boffi… - … on Computer Vision, 2024 - Springer
Abstract We present Scalable Interpolant Transformers (SiT), a family of generative models
built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which …

Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text

S Mitrović, D Andreoletti, O Ayoub - arxiv preprint arxiv:2301.13852, 2023 - arxiv.org
ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to
different types of questions from various domains. The number of its users and of its …

Why transformers need adam: A hessian perspective

Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc
SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Early convolutions help transformers see better

T **ao, M Singh, E Mintun, T Darrell… - Advances in neural …, 2021 - proceedings.neurips.cc
Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are
sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and …

Revisiting deep learning models for tabular data

Y Gorishniy, I Rubachev, V Khrulkov… - Advances in neural …, 2021 - proceedings.neurips.cc
The existing literature on deep learning for tabular data proposes a wide range of novel
architectures and reports competitive results on various datasets. However, the proposed …

UTNet: a hybrid transformer architecture for medical image segmentation

Y Gao, M Zhou, DN Metaxas - … , France, September 27–October 1, 2021 …, 2021 - Springer
Transformer architecture has emerged to be successful in a number of natural language
processing tasks. However, its applications to medical vision remain largely unexplored. In …