- Academic Search

D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat… - ar** high-performance sparse operators can be difficult and …

Spara Citera Citerat av 83 Relaterade artiklar Alla 4 versionerna

Multi-task temporal shift attention networks for on-device contactless vitals measurement

X Liu, J Fromm, S Patel… - Advances in Neural …, 2020 - proceedings.neurips.cc

Telehealth and remote health monitoring have become increasingly important during the
SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on …

Spara Citera Citerat av 290 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A hardware–software blueprint for flexible deep learning specialization

T Moreau, T Chen, L Vega, J Roesch, E Yan… - IEEE Micro, 2019 - ieeexplore.ieee.org

This article describes the Versatile Tensor Accelerator (VTA), a programmable DL
architecture designed to be extensible in the face of evolving workloads. VTA achieves …

Spara Citera Citerat av 199 Relaterade artiklar Alla 11 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming

A Paszke, DD Johnson, D Duvenaud… - Proceedings of the …, 2021 - dl.acm.org

We present a novel programming language design that attempts to combine the clarity and
safety of high-level functional languages with the efficiency and parallelism of low-level …

Spara Citera Citerat av 55 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Graph IRs for impure higher-order languages: Making aggressive optimizations affordable with precise effect dependencies

O Bračevac, G Wei, S Jia, S Abeysinghe… - Proceedings of the …, 2023 - dl.acm.org

Graph-based intermediate representations (IRs) are widely used for powerful compiler
optimizations, either interprocedurally in pure functional languages, or intraprocedurally in …

Spara Citera Citerat av 18 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Demystifying differentiable programming: Shift/reset the penultimate backpropagator

F Wang, D Zheng, J Decker, X Wu… - Proceedings of the …, 2019 - dl.acm.org

Deep learning has seen tremendous success over the past decade in computer vision,
machine translation, and gameplay. This success rests crucially on gradient-descent …

Spara Citera Citerat av 104 Relaterade artiklar Alla 6 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero bubble pipeline parallelism

P Qi, X Wan, G Huang, M Lin - arxiv preprint arxiv:2401.10241, 2023 - arxiv.org

Pipeline parallelism is one of the key components for large-scale distributed training, yet its
efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we …

Spara Citera Citerat av 20 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A tensor compiler with automatic data packing for simple and efficient fully homomorphic encryption

A Krastev, N Samardzic, S Langowski… - Proceedings of the …, 2024 - dl.acm.org

Fully Homomorphic Encryption (FHE) enables computing on encrypted data, letting clients
securely offload computation to untrusted servers. While enticing, FHE has two key …

Spara Citera Citerat av 5 Relaterade artiklar Alla 3 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Relay: A new ir for machine learning frameworks

Gshard: Scaling giant models with conditional computation and automatic sharding

Multi-task temporal shift attention networks for on-device contactless vitals measurement

A hardware–software blueprint for flexible deep learning specialization

Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming

Graph IRs for impure higher-order languages: Making aggressive optimizations affordable with precise effect dependencies

Demystifying differentiable programming: Shift/reset the penultimate backpropagator

Zero bubble pipeline parallelism

A tensor compiler with automatic data packing for simple and efficient fully homomorphic encryption