The deep learning compiler: A comprehensive survey

M Li, Y Liu, X Liu, Q Sun, X You, H Yang… - … on Parallel and …, 2020 - ieeexplore.ieee.org
The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Ten lessons from three generations shaped google's tpuv4i: Industrial product

NP Jouppi, DH Yoon, M Ashcraft… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Google deployed several TPU generations since 2015, teaching us lessons that changed
our views: semi-conductor technology advances unequally; compiler compatibility trumps …

Filtering, distillation, and hard negatives for vision-language pre-training

F Radenovic, A Dubey, A Kadian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …

Deepseek-v3 technical report

A Liu, B Feng, B Xue, B Wang, B Wu, C Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost …

A domain-specific supercomputer for training deep neural networks

NP Jouppi, DH Yoon, G Kurian, S Li, N Patil… - Communications of the …, 2020 - dl.acm.org
A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …

[BOOK][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point

B Darvish Rouhani, D Lo, R Zhao… - Advances in neural …, 2020 - proceedings.neurips.cc
In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of
datatypes developed for production cloud-scale inferencing on custom hardware. Through …

Large language models in finance: A survey

Y Li, S Wang, H Ding, H Chen - … ACM international conference on AI in …, 2023 - dl.acm.org
Recent advances in large language models (LLMs) have opened new possibilities for
artificial intelligence applications in finance. In this paper, we provide a practical survey …