Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance H Ootomo, R Yokota The International Journal of High Performance Computing Applications, 2022 | 44 | 2022 |
Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus H Ootomo, A Naruse, C Nolet, R Wang, T Feher, Y Wang The annual IEEE International Conference on Data Engineering (ICDE), 2024 | 22 | 2024 |
DGEMM on Integer Matrix Multiplication Unit H Ootomo, K Ozaki, R Yokota The International Journal of High Performance Computing Applications, 2024 | 9 | 2024 |
Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library H Ootomo, R Yokota Proceedings of the International Conference on High Performance Computing in …, 2023 | 8 | 2023 |
Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection H Ootomo, H Manabe, K Harada, R Yokota International Conference on High Performance Computing, 259-276, 2023 | 6 | 2023 |
Fast symmetric eigenvalue decomposition via wy representation on tensor core S Zhang, R Shah, H Ootomo, R Yokota, P Wu Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and …, 2023 | 3 | 2023 |
TSQR on TensorCores H Ootomo, R Yokota SC19 research poster, 2019 | 3 | 2019 |
Mixed-Precision Random Projection for RandNLA on Tensor Cores H Ootomo, R Yokota PASC '23: Proceedings of the Platform for Advanced Scientific Computing …, 2023 | 2 | 2023 |
Custom 8-bit floating point value format for reducing shared memory bank conflict in approximate nearest neighbor search H Ootomo, A Naruse arXiv preprint arXiv:2301.06672, 2023 | 1 | 2023 |