A survey of modern deep learning based object detection models

SSA Zaidi, MS Ansari, A Aslam, N Kanwal… - Digital Signal …, 2022 - Elsevier
Object Detection is the task of classification and localization of objects in an image or video.
It has gained prominence in recent years due to its widespread applications. This article …

[HTML][HTML] Graph neural networks: A review of methods and applications

J Zhou, G Cui, S Hu, Z Zhang, C Yang, Z Liu, L Wang… - AI open, 2020 - Elsevier
Lots of learning tasks require dealing with graph data which contains rich relation
information among elements. Modeling physics systems, learning molecular fingerprints …

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HY Mark Liao - European conference on computer …, 2024 - Springer
Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arxiv preprint arxiv …, 2022 - arxiv.org
For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

A convnet for the 2020s

Z Liu, H Mao, CY Wu, C Feichtenhofer… - Proceedings of the …, 2022 - openaccess.thecvf.com
The" Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arxiv preprint arxiv …, 2022 - arxiv.org
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

Zero-shot text-to-image generation

A Ramesh, M Pavlov, G Goh, S Gray… - International …, 2021 - proceedings.mlr.press
Text-to-image generation has traditionally focused on finding better modeling assumptions
for training on a fixed dataset. These assumptions might involve complex architectures …

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

F Isensee, PF Jaeger, SAA Kohl, J Petersen… - Nature …, 2021 - nature.com
Biomedical imaging is a driver of scientific discovery and a core component of medical care
and is being stimulated by the field of deep learning. While semantic segmentation …

Simam: A simple, parameter-free attention module for convolutional neural networks

L Yang, RY Zhang, L Li, X **e - International conference on …, 2021 - proceedings.mlr.press
In this paper, we propose a conceptually simple but very effective attention module for
Convolutional Neural Networks (ConvNets). In contrast to existing channel-wise and spatial …

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …