- Academic Search

LM Ang, KP Seng - Electronics, 2021 - mdpi.com

This paper present contributions to the state-of-the art for graphics processing unit (GPU-
based) embedded intelligence (EI) research for architectures and applications. This paper …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 6) Archiv

Self-aware distributed deep learning framework for heterogeneous IoT edge devices

Y **, J Cai, J Xu, Y Huan, Y Yan, B Huang… - Future Generation …, 2021 - Elsevier

Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from
the cloud to the heterogeneous and low-power edge, following an urgent demand for …

Uložit Citovat Počet citací tohoto článku: 26 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Efficient use of GPU memory for large-scale deep learning model training

H Choi, J Lee - Applied Sciences, 2021 - mdpi.com

To achieve high accuracy when performing deep learning, it is necessary to use a large-
scale training model. However, due to the limitations of GPU memory, it is difficult to train …

Uložit Citovat Počet citací tohoto článku: 21 Související články Všechny verze (počet: 3) Archiv

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Computationally efficient neural rendering for generator adversarial networks using a multi-GPU cluster in a cloud environment

A Ravikumar, H Sriraman - IEEE Access, 2023 - ieeexplore.ieee.org

Due to its fantastic performance in the quality of the images created, Generator Adversarial
Networks have recently become a viable option for image reconstruction. The main problem …

Uložit Citovat Počet citací tohoto článku: 8 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] utexas.edu

Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training

B Hanindhito, B Patel, LK John - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

The exponential growth of the training dataset and the size of the large language model
(LLM) significantly outpaces the incremental memory capacity increase in the graphics pro …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] plos.org

Towards accelerating model parallelism in distributed deep learning systems

H Choi, BH Lee, SY Chun, J Lee - Plos one, 2023 - journals.plos.org

Modern deep neural networks cannot be often trained on a single GPU due to large model
size and large data size. Model parallelism splits a model for multiple GPUs, but making it …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 8) Archiv

[Free GPT-4]
[DeepSeek]

[PDF] mmu.ac.uk

Performance analysis of distributed deep learning frameworks in a multi-GPU environment

T Kavarakuntla, L Han, H Lloyd… - … (IUCC/CIT/DSCI …, 2021 - ieeexplore.ieee.org

Deep Learning frameworks, such as TensorFlow, MXNet, Chainer, provide many basic
building blocks for designing effective neural network models for various applications (eg …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 7)

Sparse Attention Graph Gated Recurrent Unit for Spatiotemporal Behind-The-Meter Load and PV Disaggregation

M Khodayar, AF Bavil, M Saffari - 2024 16th International …, 2024 - ieeexplore.ieee.org

The increasing adoption of rooftop photovoltaic (PV) power generation systems in
residential areas necessitates accurate monitoring and disaggregation of behind-the-meter …

Uložit Citovat Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] github.io

MNN: A solution to implement neural networks into a memory-based reconfigurable logic device (MRLD)

X Zhou, S Wang, Y Higami, H Takahashi… - … on Circuits/Systems …, 2021 - ieeexplore.ieee.org

MRLD™ is a new type of reconfigurable device constructed by general SRAM array
(multiple-LUTs) which has the advantages including small delay, low power and low …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Empirical performance analysis of collective communication for distributed deep learning in a many-core cpu environment

J Woo, H Choi, J Lee - Applied Sciences, 2020 - mdpi.com

To accommodate lots of training data and complex training models,“distributed” deep
learning training has become employed more and more frequently. However …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 6) Archiv

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Efficient large-scale deep learning framework for heterogeneous multi-GPU cluster

[HTML][HTML] GPU-based embedded intelligence architectures and applications

Self-aware distributed deep learning framework for heterogeneous IoT edge devices

Efficient use of GPU memory for large-scale deep learning model training

Computationally efficient neural rendering for generator adversarial networks using a multi-GPU cluster in a cloud environment

Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training

Towards accelerating model parallelism in distributed deep learning systems

Performance analysis of distributed deep learning frameworks in a multi-GPU environment

Sparse Attention Graph Gated Recurrent Unit for Spatiotemporal Behind-The-Meter Load and PV Disaggregation

MNN: A solution to implement neural networks into a memory-based reconfigurable logic device (MRLD)

[HTML][HTML] Empirical performance analysis of collective communication for distributed deep learning in a many-core cpu environment