Академия Google

M Xu, H Du, D Niyato, J Kang, Z **ong… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org

Artificial Intelligence-Generated Content (AIGC) is an automated method for generating,
manipulating, and modifying valuable and diverse data using AI algorithms creatively. This …

Сохранить Цитировать Цитируется: 245 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Сохранить Цитировать Цитируется: 258 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Сохранить Цитировать Цитируется: 2374 Похожие статьи Все версии статьи (11) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Сохранить Цитировать Цитируется: 846 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mvbench: A comprehensive multi-modal video understanding benchmark

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the rapid development of Multi-modal Large Language Models (MLLMs) a number of
diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …

Сохранить Цитировать Цитируется: 267 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Сохранить Цитировать Цитируется: 382 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Сохранить Цитировать Цитируется: 224 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Сохранить Цитировать Цитируется: 569 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Сохранить Цитировать Цитируется: 153 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

Сохранить Цитировать Цитируется: 139 Похожие статьи Все версии статьи (6) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

The kinetics human action video dataset

Unleashing the power of edge-cloud generative AI in mobile networks: A survey of AIGC services

Parameter-efficient fine-tuning for large models: A comprehensive survey

Dinov2: Learning robust visual features without supervision

Imagebind: One embedding space to bind them all

Mvbench: A comprehensive multi-modal video understanding benchmark

Videomae v2: Scaling video masked autoencoders with dual masking

Vbench: Comprehensive benchmark suite for video generative models

Scaling vision transformers to 22 billion parameters

Videomamba: State space model for efficient video understanding

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition