Google Učenjak

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org

The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Shrani Navedi Navedeno v 410 virih Sorodni članki Vse različice: 6

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang… - arxiv preprint arxiv …, 2021 - arxiv.org

Large-scale Pretrained Language Models (PLMs) have become the new paradigm for
Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as …

Shrani Navedi Navedeno v 252 virih Sorodni članki Vse različice: 2 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] usc.edu

Capuchin: Tensor-based gpu memory management for deep learning

X Peng, X Shi, H Dai, H **, W Ma, Q **ong… - Proceedings of the …, 2020 - dl.acm.org

In recent years, deep learning has gained unprecedented success in various domains, the
key of the success is the larger and deeper deep neural networks (DNNs) that achieved very …

Shrani Navedi Navedeno v 183 virih Sorodni članki Vse različice: 2

[免费ChatGPT] [DeepSeek可用网址] [PDF] acm.org

SiP-ML: high-bandwidth optical network interconnects for machine learning training

M Khani, M Ghobadi, M Alizadeh, Z Zhu… - Proceedings of the …, 2021 - dl.acm.org

This paper proposes optical network interconnects as a key enabler for building high-
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …

Shrani Navedi Navedeno v 101 virih Sorodni članki Vse različice: 9

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Towards efficient post-training quantization of pre-trained language models

H Bai, L Hou, L Shang, X Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc

Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …

Shrani Navedi Navedeno v 61 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] scichina.com

Graph processing and machine learning architectures with emerging memory technologies: a survey

X Qian - Science China Information Sciences, 2021 - Springer

This paper surveys domain-specific architectures (DSAs) built from two emerging memory
technologies. Hybrid memory cube (HMC) and high bandwidth memory (HBM) can reduce …

Shrani Navedi Navedeno v 22 virih Sorodni članki Vse različice: 2

[免费ChatGPT] [DeepSeek可用网址] [PDF] google.com

Reconfigurability, why it matters in AI tasks processing: A survey of reconfigurable AI chips

S Wei, X Lin, F Tu, Y Wang, L Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Nowadays, artificial intelligence (AI) technologies, especially deep neural networks (DNNs),
play an vital role in solving many problems in both academia and industry. In order to …

Shrani Navedi Navedeno v 11 virih Sorodni članki Vse različice: 2

[免费ChatGPT] [DeepSeek可用网址] [PDF] usenix.org

Metis: Fast Automatic Distributed Training on Heterogeneous {GPUs}

T Um, B Oh, M Kang, WY Lee, G Kim, D Kim… - 2024 USENIX Annual …, 2024 - usenix.org

As deep learning model sizes expand and new GPUs are released every year, the need for
distributed training on heterogeneous GPUs rises to fully harness under-utilized low-end …

Shrani Navedi Navedeno v 8 virih Sorodni članki Vse različice: 5 V obliki HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] acm.org

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

Shrani Navedi Navedeno v 46 virih Sorodni članki Vse različice: 9

[免费ChatGPT] [DeepSeek可用网址] [PDF] acm.org

Prague: High-performance heterogeneity-aware asynchronous decentralized training

Q Luo, J He, Y Zhuo, X Qian - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org

Distributed deep learning training usually adopts All-Reduce as the synchronization
mechanism for data parallel algorithms due to its high performance in homogeneous …

Shrani Navedi Navedeno v 83 virih Sorodni članki Vse različice: 3

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Accpar: Tensor partitioning for heterogeneous deep learning accelerators

Spatten: Efficient sparse attention architecture with cascade token and head pruning

PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Capuchin: Tensor-based gpu memory management for deep learning

SiP-ML: high-bandwidth optical network interconnects for machine learning training

Towards efficient post-training quantization of pre-trained language models

Graph processing and machine learning architectures with emerging memory technologies: a survey

Reconfigurability, why it matters in AI tasks processing: A survey of reconfigurable AI chips

Metis: Fast Automatic Distributed Training on Heterogeneous {GPUs}

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

Prague: High-performance heterogeneity-aware asynchronous decentralized training