A survey of human-in-the-loop for machine learning
Abstract Machine learning has become the state-of-the-art technique for many tasks
including computer vision, natural language processing, speech processing tasks, etc …
including computer vision, natural language processing, speech processing tasks, etc …
A review of deep learning methods for semantic segmentation of remote sensing imagery
X Yuan, J Shi, L Gu - Expert Systems with Applications, 2021 - Elsevier
Semantic segmentation of remote sensing imagery has been employed in many
applications and is a key research topic for decades. With the success of deep learning …
applications and is a key research topic for decades. With the success of deep learning …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Imagebind: One embedding space to bind them all
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
A survey on multimodal large language models
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
Convnext v2: Co-designing and scaling convnets with masked autoencoders
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …
visual recognition has enjoyed rapid modernization and performance boost in the early …
Internimage: Exploring large-scale vision foundation models with deformable convolutions
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …
large-scale models based on convolutional neural networks (CNNs) are still in an early …
Self-supervised learning from images with a joint-embedding predictive architecture
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
Masked autoencoders are scalable vision learners
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …
for computer vision. Our MAE approach is simple: we mask random patches of the input …
Vbench: Comprehensive benchmark suite for video generative models
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …
remains a challenge. A comprehensive evaluation benchmark for video generation is …