Google Академик

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2024 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

Сачувај Цитирај 209 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on video diffusion models

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Сачувај Цитирај 98 пута наведен Сродни чланци Све верзије (4)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Сачувај Цитирај 389 пута наведен Сродни чланци Све верзије (8) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Сачувај Цитирај 585 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Сачувај Цитирај 160 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Retinexformer: One-stage retinex-based transformer for low-light image enhancement

Y Cai, H Bian, J Lin, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

When enhancing low-light images, many deep learning algorithms are based on the Retinex
theory. However, the Retinex model does not consider the corruptions hidden in the dark or …

Сачувај Цитирај 296 пута наведен Сродни чланци Све верзије (7) HTML верзија

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2023 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Сачувај Цитирај 290 пута наведен Сродни чланци Све верзије (6) Кеширано

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modelscope text-to-video technical report

J Wang, H Yuan, D Chen, Y Zhang, X Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a
text-to-image synthesis model (ie, Stable Diffusion). ModelScopeT2V incorporates spatio …

Сачувај Цитирај 351 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Moviechat: From dense token to sparse memory for long video understanding

E Song, W Chai, G Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

Сачувај Цитирај 193 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Hiera: A hierarchical vision transformer without the bells-and-whistles

C Ryali, YT Hu, D Bolya, C Wei, H Fan… - International …, 2023 - proceedings.mlr.press

Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …

Сачувај Цитирај 145 пута наведен Сродни чланци Све верзије (7) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Vivit: A video vision transformer

A comprehensive survey on applications of transformers for deep learning tasks

A survey on video diffusion models

Videomae v2: Scaling video masked autoencoders with dual masking

Scaling vision transformers to 22 billion parameters

Videomamba: State space model for efficient video understanding

Retinexformer: One-stage retinex-based transformer for low-light image enhancement

Videocomposer: Compositional video synthesis with motion controllability

Modelscope text-to-video technical report

Moviechat: From dense token to sparse memory for long video understanding

Hiera: A hierarchical vision transformer without the bells-and-whistles