フォロー
Jianyu Huang
Jianyu Huang
Meta Platforms, Inc.
確認したメール アドレス: meta.com - ホームページ
タイトル
引用先
引用先
The Llama 3 Herd of Models
A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ...
arXiv preprint arXiv:2407.21783, 2024
22042024
Deep Learning Recommendation Model for Personalization and Recommendation Systems
M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ...
arXiv preprint arXiv:1906.00091, 2019
8352019
A Study of BFLOAT16 for Deep Learning Training
D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ...
arXiv preprint arXiv:1905.12322, 2019
3872019
Software-hardware co-design for fast and scalable training of deep learning recommendation models
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Proceedings of the 49th Annual International Symposium on Computer …, 2022
1262022
Strassen's algorithm reloaded
J Huang, TM Smith, GM Henry, RA van de Geijn
High Performance Computing, Networking, Storage and Analysis, SC16 …, 2016
922016
The llama 3 herd of models
A Grattafiori, A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, ...
arXiv e-prints, arXiv: 2407.21783, 2024
622024
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
D Khudia, J Huang, P Basu, S Deng, H Liu, J Park, M Smelyanskiy
arXiv preprint arXiv:2101.05615, 0
54
Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2021. Software-Hardware Co-design …
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
arXiv preprint arXiv:2104.05158, 2022
53*2022
Performance optimization for the k-nearest neighbors kernel on x86 architectures
CD Yu, J Huang, W Austin, B Xiao, G Biros
Proceedings of the International Conference for High Performance Computing …, 2015
492015
Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs/1906.00091 (2019)
M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ...
arXiv preprint arXiv:1906.00091, 2019
45*2019
Generating families of practical fast matrix multiplication algorithms
J Huang, L Rice, DA Matthews, RA van de Geijn
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017
392017
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ...
arXiv preprint arXiv:2104.05158, 2021
352021
Mixed-Precision Embedding Using a Cache
JA Yang, J Huang, J Park, PTP Tang, A Tulloch
arXiv preprint arXiv:2010.11305, 2020
332020
Strassen’s Algorithm Reloaded on GPUs
J Huang, CD Yu, RA Geijn
ACM Transactions on Mathematical Software (TOMS) 46 (1), 1-22, 2020
252020
Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs
J Huang, CD Yu, RA van de Geijn
arXiv preprint arXiv:1808.07984, 2018
232018
Strassen's Algorithm for Tensor Contraction
J Huang, DA Matthews, RA van de Geijn
SIAM Journal on Scientific Computing 40 (3), C305-C326, 2018
232018
A study of BFLOAT16 for deep learning training (2019)
D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ...
arXiv preprint arXiv:1905.12322, 1905
171905
BLISlab: A Sandbox for Optimizing GEMM
J Huang, RA van de Geijn
arXiv preprint arXiv:1609.00076, 2016
162016
Efficient soft-error detection for low-precision deep learning recommendation models
S Li, J Huang, PTP Tang, D Khudia, J Park, HD Dixit, Z Chen
2022 IEEE International Conference on Big Data (Big Data), 1556-1563, 2022
152022
{AdaEmbed}: Adaptive Embedding for {Large-Scale} Recommendation Models
F Lai, W Zhang, R Liu, W Tsai, X Wei, Y Hu, S Devkota, J Huang, J Park, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
142023
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–20