Large models for intelligent transportation systems and autonomous vehicles: A survey
Large models are widely used in intelligent transportation systems (ITS) and autonomous
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …
Accelerating drug discovery in AutoDock-GPU with tensor cores
In drug discovery, molecular docking aims at characterizing the binding of a drug-like
molecule to a macromolecule. AutoDock-GPU, a state-of-the-art docking software, estimates …
molecule to a macromolecule. AutoDock-GPU, a state-of-the-art docking software, estimates …
A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks
The randomization-based feedforward neural network has raised great interest in the
scientific community due to its simplicity, training speed, and accuracy comparable to …
scientific community due to its simplicity, training speed, and accuracy comparable to …
High performance hierarchical tucker tensor learning using gpu tensor cores
Extracting information from large-scale high-dimensional data is a fundamentally important
task in high performance computing, where the hierarchical Tucker (HT) tensor learning …
task in high performance computing, where the hierarchical Tucker (HT) tensor learning …
Accelerating fourier and number theoretic transforms using tensor cores and warp shuffles
S Durrani, MS Chughtai, M Hidayetoglu… - 2021 30th …, 2021 - ieeexplore.ieee.org
The discrete Fourier transform (DFT) and its specialized case, the number theoretic
transform (NTT), are two important mathematical tools having applications in several areas …
transform (NTT), are two important mathematical tools having applications in several areas …
DPCrypto: Acceleration of post-quantum cryptography using dot-product instructions on GPUs
Modern NVIDIA GPU architectures offer dot-product instructions (DP2A and DP4A), with the
aim of accelerating machine learning and scientific computing applications. These dot …
aim of accelerating machine learning and scientific computing applications. These dot …
On the rise of amd matrix cores: Performance, power efficiency, and programmability
G Schieffer, DA De Medeiros, J Faj… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
Matrix multiplication is a core computational part of deep learning and scientific workloads.
The emergence of Matrix Cores in high-end AMD GPUs, a building block of Exascale …
The emergence of Matrix Cores in high-end AMD GPUs, a building block of Exascale …
High-Performance Tensor-Train Primitives Using GPU Tensor Cores
Learning tensor-train (TT) structure (aka matrix product state (MPS) representation) from
large-scale high-dimensional data has been a common task in big data analysis, deep …
large-scale high-dimensional data has been a common task in big data analysis, deep …
A novel parallel algorithm for sparse tensor matrix chain multiplication via tcu-acceleration
H Wang, W Yang, R Hu, R Ouyang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Analysis of multi-dimensional data, especially tensor decomposition, which extracts latent
information, is becoming considerably popular. Although multi-dimensional sparse data is …
information, is becoming considerably popular. Although multi-dimensional sparse data is …
Accelerating range minimum queries with ray tracing cores
Over the past decade, GPU technology has undergone a notable transformation, evolving
from pure general-purpose computation to the integration of application-specific integrated …
from pure general-purpose computation to the integration of application-specific integrated …