Unleashing the power of edge-cloud generative AI in mobile networks: A survey of AIGC services

M Xu, H Du, D Niyato, J Kang, Z **ong… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org
Artificial Intelligence-Generated Content (AIGC) is an automated method for generating,
manipulating, and modifying valuable and diverse data using AI algorithms creatively. This …

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Mvbench: A comprehensive multi-modal video understanding benchmark

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the rapid development of Multi-modal Large Language Models (MLLMs) a number of
diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Unireplknet: A universal perception large-kernel convnet for audio video point cloud time-series and image recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …