Artificial intelligence for multimodal data integration in oncology

J Lipkova, RJ Chen, B Chen, MY Lu, M Barbieri… - Cancer cell, 2022 - cell.com
In oncology, the patient state is characterized by a whole spectrum of modalities, ranging
from radiology, histology, and genomics to electronic health records. Current artificial …

[HTML][HTML] Review of image classification algorithms based on convolutional neural networks

L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com
Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …

Motiondiffuse: Text-driven human motion generation with diffusion model

M Zhang, Z Cai, L Pan, F Hong, X Guo, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
Human motion modeling is important for many modern graphics applications, which typically
require professional skills. In order to remove the skill barriers for laymen, recent motion …

Video probabilistic diffusion models in projected latent space

S Yu, K Sohn, S Kim, J Shin - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Despite the remarkable progress in deep generative models, synthesizing high-resolution
and temporally coherent videos still remains a challenge due to their high-dimensionality …

GhostNetv2: Enhance cheap operation with long-range attention

Y Tang, K Han, J Guo, C Xu, C Xu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Light-weight convolutional neural networks (CNNs) are specially designed for applications
on mobile devices with faster inference speed. The convolutional operation can only capture …

Cogvideo: Large-scale pretraining for text-to-video generation via transformers

W Hong, M Ding, W Zheng, X Liu, J Tang - arxiv preprint arxiv:2205.15868, 2022 - arxiv.org
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-
image (DALL-E and CogView) generation. Its application to video generation is still facing …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - … on Computer Vision, 2022 - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Multiview transformers for video recognition

S Yan, X **ong, A Arnab, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …