[HTML][HTML] GPU-based embedded intelligence architectures and applications

LM Ang, KP Seng - Electronics, 2021 - mdpi.com
This paper present contributions to the state-of-the art for graphics processing unit (GPU-
based) embedded intelligence (EI) research for architectures and applications. This paper …

Self-aware distributed deep learning framework for heterogeneous IoT edge devices

Y **, J Cai, J Xu, Y Huan, Y Yan, B Huang… - Future Generation …, 2021 - Elsevier
Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from
the cloud to the heterogeneous and low-power edge, following an urgent demand for …

Efficient use of GPU memory for large-scale deep learning model training

H Choi, J Lee - Applied Sciences, 2021 - mdpi.com
To achieve high accuracy when performing deep learning, it is necessary to use a large-
scale training model. However, due to the limitations of GPU memory, it is difficult to train …

Computationally efficient neural rendering for generator adversarial networks using a multi-GPU cluster in a cloud environment

A Ravikumar, H Sriraman - IEEE Access, 2023 - ieeexplore.ieee.org
Due to its fantastic performance in the quality of the images created, Generator Adversarial
Networks have recently become a viable option for image reconstruction. The main problem …

Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training

B Hanindhito, B Patel, LK John - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
The exponential growth of the training dataset and the size of the large language model
(LLM) significantly outpaces the incremental memory capacity increase in the graphics pro …

Towards accelerating model parallelism in distributed deep learning systems

H Choi, BH Lee, SY Chun, J Lee - Plos one, 2023 - journals.plos.org
Modern deep neural networks cannot be often trained on a single GPU due to large model
size and large data size. Model parallelism splits a model for multiple GPUs, but making it …

Performance analysis of distributed deep learning frameworks in a multi-GPU environment

T Kavarakuntla, L Han, H Lloyd… - … (IUCC/CIT/DSCI …, 2021 - ieeexplore.ieee.org
Deep Learning frameworks, such as TensorFlow, MXNet, Chainer, provide many basic
building blocks for designing effective neural network models for various applications (eg …

Sparse Attention Graph Gated Recurrent Unit for Spatiotemporal Behind-The-Meter Load and PV Disaggregation

M Khodayar, AF Bavil, M Saffari - 2024 16th International …, 2024 - ieeexplore.ieee.org
The increasing adoption of rooftop photovoltaic (PV) power generation systems in
residential areas necessitates accurate monitoring and disaggregation of behind-the-meter …

MNN: A solution to implement neural networks into a memory-based reconfigurable logic device (MRLD)

X Zhou, S Wang, Y Higami, H Takahashi… - … on Circuits/Systems …, 2021 - ieeexplore.ieee.org
MRLD™ is a new type of reconfigurable device constructed by general SRAM array
(multiple-LUTs) which has the advantages including small delay, low power and low …

[HTML][HTML] Empirical performance analysis of collective communication for distributed deep learning in a many-core cpu environment

J Woo, H Choi, J Lee - Applied Sciences, 2020 - mdpi.com
To accommodate lots of training data and complex training models,“distributed” deep
learning training has become employed more and more frequently. However …