A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Transformers in medical imaging: A survey
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …
successfully applied to several computer vision problems, achieving state-of-the-art results …
Segnext: Rethinking convolutional attention design for semantic segmentation
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …
segmentation. Recent transformer-based models have dominated the field of se-mantic …
Visual attention network
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …
mechanism has recently taken various computer vision areas by storm. However, the 2D …
Large selective kernel network for remote sensing object detection
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
representation of oriented bounding boxes but has overlooked the unique prior knowledge …
The multi-modal fusion in visual question answering: a review of attention mechanisms
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …
fields of computer vision and natural language processing that requires a computer to output …
Spike-driven transformer
Abstract Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …
Convolutional neural networks: A survey
M Krichen - Computers, 2023 - mdpi.com
Artificial intelligence (AI) has become a cornerstone of modern technology, revolutionizing
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …
industries from healthcare to finance. Convolutional neural networks (CNNs) are a subset of …
Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
YOLOv5-Tassel: Detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning
W Liu, K Quijano, MM Crawford - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Unmanned aerial vehicles (UAVs) equipped with lightweight sensors, such as RGB cameras
and LiDAR, have significant potential in precision agriculture, including object detection …
and LiDAR, have significant potential in precision agriculture, including object detection …