A review of recurrent neural networks: LSTM cells and network architectures
Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu
Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
Image and video compression with neural networks: A review
In recent years, the image and video coding technologies have advanced by leaps and
bounds. However, due to the popularization of image and video acquisition devices, the …
bounds. However, due to the popularization of image and video acquisition devices, the …
Preserve your own correlation: A noise prior for video diffusion models
Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …
synthesizing a sequence of animated frames that are both photorealistic and temporally …
Imagen video: High definition video generation with diffusion models
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …
of video diffusion models. Given a text prompt, Imagen Video generates high definition …
Sequential modeling enables scalable learning for large vision models
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …
Model (LVM) without making use of any linguistic data. To do this we define a common …
DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving
World models, especially in autonomous driving, are trending and drawing extensive
attention due to their capacity for comprehending driving environments. The established …
attention due to their capacity for comprehending driving environments. The established …
Phenaki: Variable length video generation from open domain textual descriptions
We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …
textual prompts. Generating videos from text is particularly challenging due to the …
Factorizing text-to-video generation by explicit image conditioning
Abstract We present Emu Video, a text-to-video generation model that factorizes the
generation into two steps: first generating an image conditioned on the text, and then …
generation into two steps: first generating an image conditioned on the text, and then …
Simvp: Simpler yet better video prediction
Abstract From CNN, RNN, to ViT, we have witnessed remarkable advancements in video
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …
prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …
Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2
I Skorokhodov, S Tulyakov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Videos show continuous events, yet most--if not all--video synthesis frameworks treat them
discretely in time. In this work, we think of videos of what they should be--time-continuous …
discretely in time. In this work, we think of videos of what they should be--time-continuous …