- Academic Search

X Wu, L **ao, Y Sun, J Zhang, T Ma, L He - Future Generation Computer …, 2022 - Elsevier

Abstract Machine learning has become the state-of-the-art technique for many tasks
including computer vision, natural language processing, speech processing tasks, etc …

保存引用被引用次数：637 相关文章所有 6 个版本

A review of deep learning methods for semantic segmentation of remote sensing imagery

X Yuan, J Shi, L Gu - Expert Systems with Applications, 2021 - Elsevier

Semantic segmentation of remote sensing imagery has been employed in many
applications and is a key research topic for decades. With the success of deep learning …

保存引用被引用次数：609 相关文章

[Free GPT-4]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

保存引用被引用次数：2198 相关文章所有 11 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

保存引用被引用次数：799 相关文章所有 7 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

保存引用被引用次数：1061 相关文章所有 6 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

保存引用被引用次数：677 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

保存引用被引用次数：791 相关文章所有 8 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

保存引用被引用次数：333 相关文章所有 7 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Masked autoencoders are scalable vision learners

K He, X Chen, S **e, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

保存引用被引用次数：8511 相关文章所有 11 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

保存引用被引用次数：202 相关文章所有 4 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Learning deep features for scene recognition using places database

A survey of human-in-the-loop for machine learning

A review of deep learning methods for semantic segmentation of remote sensing imagery

Dinov2: Learning robust visual features without supervision

Imagebind: One embedding space to bind them all

A survey on multimodal large language models

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Self-supervised learning from images with a joint-embedding predictive architecture

Masked autoencoders are scalable vision learners

Vbench: Comprehensive benchmark suite for video generative models