Unlocking the potential of edge computing for hyperspectral image classification: An efficient low-energy strategy

G De Lucia, M Lapegna, D Romano - Future Generation Computer Systems, 2023 - Elsevier
Despite recent improvements, the computing capability of Edge Computing devices is still
inferior to high-end servers, so special methodologies are required to consider the …

Novel accelerated methods for convolution neural network with matrix core

Y Guo, L Lu, S Zhu - The Journal of Supercomputing, 2023 - Springer
The powerful parallel computing capability of GPU and the development of matrix
processing unit in recent years provide more possibilities to improve the performance of …

Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure

J Jiang, Z Huang, D Huang, J Du, L Chen… - ACM Transactions on …, 2023 - dl.acm.org
The tremendous success of convolutional neural network (CNN) has made it ubiquitous in
many fields of human endeavor. Many applications such as biomedical analysis and …

Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms

K Zhao, Y Tan, K Han, T Hu, H Chen… - … on Machine Learning …, 2023 - openreview.net
Model sparsity is a promising approach to reducing parameters or FLOPs of convolutional
neural networks (CNNs). Compared to unstructured or coarse-grained structured sparsity …

Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform

R Tian, J Jiang, J Du, D Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recommendation systems are essential to the operation of the majority of internet services,
with Deep Learning Recommendation Models (DLRMs) serving as a crucial component …

High-Performance 3D convolution on the Latest Generation Sunway Processor

J Li, Z Feng, Y Gao, S Tian, H Zhang, H Ye… - Proceedings of the 53rd …, 2024 - dl.acm.org
The emergence of High-Performance Computing (HPC) and Artificial Intelligence (AI) has
significantly expanded the applications of three-dimensional convolutional neural networks …

Joint Segmentation and Sub-pixel Localization in Structured Light Laryngoscopy

JO Henningson, M Semmler, M Döllinger… - … Conference on Medical …, 2023 - Springer
In recent years, phoniatric diagnostics has seen a surge of interest in structured light-based
high-speed video endoscopy, as it enables the observation of oscillating human vocal folds …

MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform

J Jiang, R Tian, J Du, D Huang… - 2023 IEEE 41st …, 2023 - ieeexplore.ieee.org
The development of deep learning recommendation models (DLRM) and recommendation
systems has significantly improved the precision of information matching. Due to distinct …

Convolutional neural network inference and training vectorization method for multicore vector accelerators

J CHEN, C LI, Z LIU - Computer Engineering & Science, 2024 - joces.nudt.edu.cn
With the widespread application of deep learning, represented by convolutional neural
networks (CNNs), the computational requirements of neural network models have increased …

面向多核向量加速器的卷积神经网络推理和训练向量化方法

陈杰, **程, 刘仲 - 计算机工程与科学, 2024 - joces.nudt.edu.cn
随着以卷积神经网络为代表的深度学**得到广泛应用, 神经网络模型中的计算量也急速增长,
推动了深度学**加速器的发展. 如何针对加速器硬件的体系结构特性进行加速和优化神经网络 …