A survey of human-in-the-loop for machine learning

X Wu, L **ao, Y Sun, J Zhang, T Ma, L He - Future Generation Computer …, 2022 - Elsevier
Abstract Machine learning has become the state-of-the-art technique for many tasks
including computer vision, natural language processing, speech processing tasks, etc …

A review of deep learning methods for semantic segmentation of remote sensing imagery

X Yuan, J Shi, L Gu - Expert Systems with Applications, 2021 - Elsevier
Semantic segmentation of remote sensing imagery has been employed in many
applications and is a key research topic for decades. With the success of deep learning …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

Masked autoencoders are scalable vision learners

K He, X Chen, S **e, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …