Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Deep learning for anomaly detection: A review

G Pang, C Shen, L Cao, AVD Hengel - ACM computing surveys (CSUR), 2021 - dl.acm.org
Anomaly detection, aka outlier detection or novelty detection, has been a lasting yet active
research area in various research communities for several decades. There are still some …

Autoregressive image generation without vector quantization

T Li, Y Tian, H Li, M Deng, K He - Advances in Neural …, 2025 - proceedings.neurips.cc
Conventional wisdom holds that autoregressive models for image generation are typically
accompanied by vector-quantized tokens. We observe that while a discrete-valued space …

Regularized vector quantization for tokenized image synthesis

J Zhang, F Zhan, C Theobalt… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Quantizing images into discrete representations has been a fundamental problem in unified
generative modeling. Predominant approaches learn the discrete representation either in a …

[CITAS][C] An introduction to variational autoencoders

DP Kingma, M Welling - Foundations and Trends® in …, 2019 - nowpublishers.com
An Introduction to Variational Autoencoders Page 1 An Introduction to Variational Autoencoders
Page 2 Other titles in Foundations and Trends R in Machine Learning Computational Optimal …

Deep learning in multi-object detection and tracking: state of the art

SK Pal, A Pramanik, J Maiti, P Mitra - Applied Intelligence, 2021 - Springer
Object detection and tracking is one of the most important and challenging branches in
computer vision, and have been widely applied in various fields, such as health-care …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

A comprehensive review on deep learning-based methods for video anomaly detection

R Nayak, UC Pati, SK Das - Image and Vision Computing, 2021 - Elsevier
Video surveillance systems are popular and used in public places such as market places,
shop** malls, hospitals, banks, streets, education institutions, city administrative offices …

Neural discrete representation learning

A Van Den Oord, O Vinyals - Advances in neural …, 2017 - proceedings.neurips.cc
Learning useful representations without supervision remains a key challenge in machine
learning. In this paper, we propose a simple yet powerful generative model that learns such …

Categorical reparameterization with gumbel-softmax

E Jang, S Gu, B Poole - arxiv preprint arxiv:1611.01144, 2016 - arxiv.org
Categorical variables are a natural choice for representing discrete structure in the world.
However, stochastic neural networks rarely use categorical latent variables due to the …