Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods

Y Cao, H Zhao, Y Cheng, T Shu, Y Chen… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

V2x-vit: Vehicle-to-everything cooperative perception with vision transformer

R Xu, H **ang, Z Tu, X **a, MH Yang, J Ma - European conference on …, 2022 - Springer
In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to
improve the perception performance of autonomous vehicles. We present a robust …

R2former: Unified retrieval and reranking transformer for place recognition

S Zhu, L Yang, C Chen, M Shah… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Visual Place Recognition (VPR) estimates the location of query images by matching
them with images in a reference database. Conventional methods generally adopt …

Semantic segmentation using Vision Transformers: A survey

H Thisanke, C Deshan, K Chamith… - … Applications of Artificial …, 2023 - Elsevier
Semantic segmentation has a broad range of applications in a variety of domains including
land coverage analysis, autonomous driving, and medical image analysis. Convolutional …

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer
Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

CSwin-PNet: A CNN-Swin Transformer combined pyramid network for breast lesion segmentation in ultrasound images

H Yang, D Yang - Expert Systems with Applications, 2023 - Elsevier
Currently, the automatic segmentation of breast tumors based on breast ultrasound (BUS)
images is still a challenging task. Most lesion segmentation methods are implemented …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …