Reinforcement learning algorithms: A brief survey

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023‏ - Elsevier
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

Artificial intelligence for multimodal data integration in oncology

J Lipkova, RJ Chen, B Chen, MY Lu, M Barbieri… - Cancer cell, 2022‏ - cell.com
In oncology, the patient state is characterized by a whole spectrum of modalities, ranging
from radiology, histology, and genomics to electronic health records. Current artificial …

GhostNetv2: Enhance cheap operation with long-range attention

Y Tang, K Han, J Guo, C Xu, C Xu… - Advances in Neural …, 2022‏ - proceedings.neurips.cc
Light-weight convolutional neural networks (CNNs) are specially designed for applications
on mobile devices with faster inference speed. The convolutional operation can only capture …

Motiondiffuse: Text-driven human motion generation with diffusion model

M Zhang, Z Cai, L Pan, F Hong, X Guo… - IEEE transactions on …, 2024‏ - ieeexplore.ieee.org
Human motion modeling is important for many modern graphics applications, which typically
require professional skills. In order to remove the skill barriers for laymen, recent motion …

Video probabilistic diffusion models in projected latent space

S Yu, K Sohn, S Kim, J Shin - Proceedings of the IEEE/CVF …, 2023‏ - openaccess.thecvf.com
Despite the remarkable progress in deep generative models, synthesizing high-resolution
and temporally coherent videos still remains a challenge due to their high-dimensionality …

Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment

B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
The video-language (VL) pretraining has achieved remarkable improvement in multiple
downstream tasks. However, the current VL pretraining framework is hard to extend to …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022‏ - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Cogvideo: Large-scale pretraining for text-to-video generation via transformers

W Hong, M Ding, W Zheng, X Liu, J Tang - arxiv preprint arxiv:2205.15868, 2022‏ - arxiv.org
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-
image (DALL-E and CogView) generation. Its application to video generation is still facing …

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022‏ - proceedings.neurips.cc
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022‏ - proceedings.neurips.cc
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …