Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Styleclip: Text-driven manipulation of stylegan imagery

O Patashnik, Z Wu, E Shechtman… - Proceedings of the …, 2021 - openaccess.thecvf.com
Inspired by the ability of StyleGAN to generate highly re-alistic images in a variety of
domains, much recent work hasfocused on understanding how to use the latent spaces …

Understanding and creating art with AI: Review and outlook

E Cetinic, J She - ACM Transactions on Multimedia Computing …, 2022 - dl.acm.org
Technologies related to artificial intelligence (AI) have a strong impact on the changes of
research and creative practices in visual arts. The growing number of research initiatives …

Frozen pretrained transformers as universal computation engines

K Lu, A Grover, P Abbeel, I Mordatch - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Symbolic music generation with diffusion models

G Mittal, J Engel, C Hawthorne, I Simon - arxiv preprint arxiv:2103.16091, 2021 - arxiv.org
Score-based generative models and diffusion probabilistic models have been successful at
generating high-quality samples in continuous domains such as images and audio …

Generating images with sparse representations

C Nash, J Menick, S Dieleman, PW Battaglia - arxiv preprint arxiv …, 2021 - arxiv.org
The high dimensionality of images presents architecture and sampling-efficiency challenges
for likelihood-based generative models. Previous approaches such as VQ-VAE use deep …

How to Protect Copyright Data in Optimization of Large Language Models?

T Chu, Z Song, C Yang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
The softmax operator is a crucial component of large language models (LLMs), which have
played a transformative role in computer research. Due to the centrality of the softmax …

Attention approximates sparse distributed memory

T Bricken, C Pehlevan - Advances in Neural Information …, 2021 - proceedings.neurips.cc
While Attention has come to be an important mechanism in deep learning, there remains
limited intuition for why it works so well. Here, we show that Transformer Attention can be …