متابعة
Haojie Wang
Haojie Wang
بريد إلكتروني تم التحقق منه على tsinghua.edu.cn
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ...
15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021
752021
Fastermoe: modeling and optimizing training of large-scale dynamic pre-trained models
J He, J Zhai, T Antunes, H Wang, F Luo, S Shi, Q Li
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
672022
BaGuaLu: targeting brain scale pretrained models with over 37 million cores
Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ...
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
592022
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU
C Zhang, Z Song, H Wang, K Rong, J Zhai
Proceedings of the 35th ACM International Conference on Supercomputing, 443-454, 2021
282021
Spindle: Informed memory access monitoring
H Wang, J Zhai, X Tang, B Yu, X Ma, W Chen
2018 USENIX Annual Technical Conference (USENIX ATC 18), 561-574, 2018
262018
Scaling graph traversal to 281 trillion edges with 40 million cores
H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
232022
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement
X Tang, H Wang, X Ma, N El-Sayed, J Zhai, W Chen, A Aboulnaga
Proceedings of the International Conference for High Performance Computing …, 2019
182019
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs
S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming …, 2022
142022
: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs
J Huang, H Wang, X Fei, X Wang, W Chen
IEEE Transactions on Parallel and Distributed Systems 33 (11), 3067-3078, 2021
132021
UniQ: A unified programming model for efficient quantum circuit simulation
C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai
SC22: International Conference for High Performance Computing, Networking …, 2022
122022
ScalAna: Automating scaling loss detection with graph analysis
Y Jin, H Wang, T Yu, X Tang, T Hoefler, X Liu, J Zhai
SC20: International Conference for High Performance Computing, Networking …, 2020
122020
PerFlow: A domain specific framework for automatic performance analysis of parallel applications
Y Jin, H Wang, R Zhong, C Zhang, J Zhai
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
102022
Vapro: Performance variance detection and diagnosis for production-run parallel applications
L Zheng, J Zhai, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
92022
Efficiently emulating high-bitwidth computation with low-bitwidth hardware
Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai
Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022
62022
LotusSQL: SQL engine for high-performance big data systems
X Li, B Yu, G Feng, H Wang, W Chen
Big Data Mining and Analytics 4 (4), 252-265, 2021
62021
Identifying scalability bottlenecks for large-scale parallel programs with graph analysis
Y Jin, H Wang, X Tang, T Hoefler, X Liu, J Zhai
Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020
42020
An Efficient Sparse CNNs Accelerator on FPGA
Y Zhang, H Jiang, X Li, H Wang, D Dong, Y Cao
2022 IEEE International Conference on Cluster Computing (CLUSTER), 504-505, 2022
22022
OLLIE: Derivation-based tensor program optimizer
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ...
arXiv preprint arXiv:2208.02025, 2022
22022
Detecting performance variance for parallel applications without source code
J Zhai, L Zheng, F Zhang, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen
IEEE Transactions on Parallel and Distributed Systems 33 (12), 4239-4255, 2022
22022
Sparker: Efficient reduction for more scalable machine learning with spark
B Yu, H Cao, T Shan, H Wang, X Tang, W Chen
Proceedings of the 50th International Conference on Parallel Processing, 1-11, 2021
22021
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–20