Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10

Y Liu, Y Xue, Y Cheng, L Ma, Z Miao, J Xue… - Proceedings of the ACM …, 2024 - dl.acm.org
As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing,
inter-core communication is enabled recently by employing high-bandwidth and low-latency …

Uncovering nested data parallelism and data reuse in dnn computation with fractaltensor

S Liu, C Qi, Y Cao, C Yang, W Hu, X Shi… - Proceedings of the …, 2024 - dl.acm.org
To speed up computation, deep neural networks (DNNs) usually rely on highly optimized
tensor operators. Despite the effectiveness, tensor operators are often defined empirically …

[PDF][PDF] Cascade: A Platform for Delay-Sensitive Edge Intelligence

W Song, T Garrett, Y Yang, M Liu, E Tremel… - arxiv preprint arxiv …, 2023 - cs.cornell.edu
Interest in intelligent edge computing is surging, driven by improving connectivity and
hardware advances. This is creating a need: today's cloud platforms optimize for high …

[PDF][PDF] MAGPY: compiling eager mode DNN programs by monitoring execution states

C Zhang, R Dong, H Wang, R Zhong… - Proceedings of the …, 2024 - heheda12345.github.io
Real-world deep learning programs are often developed with dynamic programming
languages like Python, which usually have complex features, such as built-in functions and …

Unifying On-device Tensor Program Optimization through Large Foundation Model

Z Zhao, N Ling, K Liu, N Guan, G **ng - … of the 21st ACM Conference on …, 2023 - dl.acm.org
We present TensorBind, a novel approach aimed at unifying different hardware architectures
for compilation optimization. Our proposed framework establishes an embedding space to …

Composable Architecture Primitives for the Era of Efficient Generalization

ME Davies - 2024 - search.proquest.com
We are in the age of AI, with rapidly changing algorithms and hardware. As Deep Learning
(DL) applications have evolved, architects have turned to specialization to keep up with the …

Data-driven adaptable hardware accelerators for graph neural networks

P Puigdemont Plana - 2024 - upcommons.upc.edu
Graph Neural Networks (GNNs) excel in tasks like node classification, edge predic-tion, and
graph classification, with applications in recommendation systems, network ana-lysis, and …

[PDF][PDF] TensorBind: Unifying On-device Tensor Program Optimization through Foundation Model

Z Zhao, N Ling, K Liu, N Guan, G **ng - 2023 - neawhen.github.io
We present TensorBind, a novel approach aimed at unifying different hardware architectures
for compilation optimization. Our proposed framework establishes an embedding space to …