Large models for intelligent transportation systems and autonomous vehicles: A survey

L Gan, W Chu, G Li, X Tang, K Li - Advanced Engineering Informatics, 2024 - Elsevier
Large models are widely used in intelligent transportation systems (ITS) and autonomous
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …

Accelerating drug discovery in AutoDock-GPU with tensor cores

G Schieffer, I Peng - European Conference on Parallel Processing, 2023 - Springer
In drug discovery, molecular docking aims at characterizing the binding of a drug-like
molecule to a macromolecule. AutoDock-GPU, a state-of-the-art docking software, estimates …

A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks

E Gelvez-Almeida, M Mora, RJ Barrientos… - Mathematical and …, 2024 - mdpi.com
The randomization-based feedforward neural network has raised great interest in the
scientific community due to its simplicity, training speed, and accuracy comparable to …

High performance hierarchical tucker tensor learning using gpu tensor cores

H Huang, XY Liu, W Tong, T Zhang… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Extracting information from large-scale high-dimensional data is a fundamentally important
task in high performance computing, where the hierarchical Tucker (HT) tensor learning …

Accelerating fourier and number theoretic transforms using tensor cores and warp shuffles

S Durrani, MS Chughtai, M Hidayetoglu… - 2021 30th …, 2021 - ieeexplore.ieee.org
The discrete Fourier transform (DFT) and its specialized case, the number theoretic
transform (NTT), are two important mathematical tools having applications in several areas …

DPCrypto: Acceleration of post-quantum cryptography using dot-product instructions on GPUs

WK Lee, H Seo, SO Hwang, R Achar… - … on Circuits and …, 2022 - ieeexplore.ieee.org
Modern NVIDIA GPU architectures offer dot-product instructions (DP2A and DP4A), with the
aim of accelerating machine learning and scientific computing applications. These dot …

On the rise of amd matrix cores: Performance, power efficiency, and programmability

G Schieffer, DA De Medeiros, J Faj… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
Matrix multiplication is a core computational part of deep learning and scientific workloads.
The emergence of Matrix Cores in high-end AMD GPUs, a building block of Exascale …

High-Performance Tensor-Train Primitives Using GPU Tensor Cores

XY Liu, H Hong, Z Zhang, W Tong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Learning tensor-train (TT) structure (aka matrix product state (MPS) representation) from
large-scale high-dimensional data has been a common task in big data analysis, deep …

A novel parallel algorithm for sparse tensor matrix chain multiplication via tcu-acceleration

H Wang, W Yang, R Hu, R Ouyang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Analysis of multi-dimensional data, especially tensor decomposition, which extracts latent
information, is becoming considerably popular. Although multi-dimensional sparse data is …

Accelerating range minimum queries with ray tracing cores

E Meneses, CA Navarro, H Ferrada… - Future Generation …, 2024 - Elsevier
Over the past decade, GPU technology has undergone a notable transformation, evolving
from pure general-purpose computation to the integration of application-specific integrated …