Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10
As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing,
inter-core communication is enabled recently by employing high-bandwidth and low-latency …
inter-core communication is enabled recently by employing high-bandwidth and low-latency …
Uncovering nested data parallelism and data reuse in dnn computation with fractaltensor
To speed up computation, deep neural networks (DNNs) usually rely on highly optimized
tensor operators. Despite the effectiveness, tensor operators are often defined empirically …
tensor operators. Despite the effectiveness, tensor operators are often defined empirically …
[PDF][PDF] Cascade: A Platform for Delay-Sensitive Edge Intelligence
Interest in intelligent edge computing is surging, driven by improving connectivity and
hardware advances. This is creating a need: today's cloud platforms optimize for high …
hardware advances. This is creating a need: today's cloud platforms optimize for high …
[PDF][PDF] MAGPY: compiling eager mode DNN programs by monitoring execution states
Real-world deep learning programs are often developed with dynamic programming
languages like Python, which usually have complex features, such as built-in functions and …
languages like Python, which usually have complex features, such as built-in functions and …
Unifying On-device Tensor Program Optimization through Large Foundation Model
We present TensorBind, a novel approach aimed at unifying different hardware architectures
for compilation optimization. Our proposed framework establishes an embedding space to …
for compilation optimization. Our proposed framework establishes an embedding space to …
Composable Architecture Primitives for the Era of Efficient Generalization
ME Davies - 2024 - search.proquest.com
We are in the age of AI, with rapidly changing algorithms and hardware. As Deep Learning
(DL) applications have evolved, architects have turned to specialization to keep up with the …
(DL) applications have evolved, architects have turned to specialization to keep up with the …
Data-driven adaptable hardware accelerators for graph neural networks
P Puigdemont Plana - 2024 - upcommons.upc.edu
Graph Neural Networks (GNNs) excel in tasks like node classification, edge predic-tion, and
graph classification, with applications in recommendation systems, network ana-lysis, and …
graph classification, with applications in recommendation systems, network ana-lysis, and …
[PDF][PDF] TensorBind: Unifying On-device Tensor Program Optimization through Foundation Model
We present TensorBind, a novel approach aimed at unifying different hardware architectures
for compilation optimization. Our proposed framework establishes an embedding space to …
for compilation optimization. Our proposed framework establishes an embedding space to …