The deep learning compiler: A comprehensive survey
The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …
boosted the research and development of DL compilers in the community. Several DL …
A survey of techniques for optimizing transformer inference
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …
transformer neural networks. The family of transformer networks, including Bidirectional …
Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Ten lessons from three generations shaped google's tpuv4i: Industrial product
Google deployed several TPU generations since 2015, teaching us lessons that changed
our views: semi-conductor technology advances unequally; compiler compatibility trumps …
our views: semi-conductor technology advances unequally; compiler compatibility trumps …
Filtering, distillation, and hard negatives for vision-language pre-training
Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …
Deepseek-v3 technical report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost …
total parameters with 37B activated for each token. To achieve efficient inference and cost …
A domain-specific supercomputer for training deep neural networks
A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …
[BOOK][B] Efficient processing of deep neural networks
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …
Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point
In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of
datatypes developed for production cloud-scale inferencing on custom hardware. Through …
datatypes developed for production cloud-scale inferencing on custom hardware. Through …
Large language models in finance: A survey
Recent advances in large language models (LLMs) have opened new possibilities for
artificial intelligence applications in finance. In this paper, we provide a practical survey …
artificial intelligence applications in finance. In this paper, we provide a practical survey …