Hardware implementation of memristor-based artificial neural networks

F Aguirre, A Sebastian, M Le Gallo, W Song… - Nature …, 2024 - nature.com
Artificial Intelligence (AI) is currently experiencing a bloom driven by deep learning (DL)
techniques, which rely on networks of connected simple computing units operating in …

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024 - dl.acm.org
This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

Edge learning using a fully integrated neuro-inspired memristor chip

W Zhang, P Yao, B Gao, Q Liu, D Wu, Q Zhang, Y Li… - Science, 2023 - science.org
Learning is highly important for edge intelligence devices to adapt to different application
scenes and owners. Current technologies for training neural networks require moving …

All-analog photoelectronic chip for high-speed vision tasks

Y Chen, M Nazhamaiti, H Xu, Y Meng, T Zhou, G Li… - Nature, 2023 - nature.com
Photonic computing enables faster and more energy-efficient processing of vision data,,,–.
However, experimental superiority of deployable systems remains a challenge because of …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Training compute-optimal large language models

J Hoffmann, S Borgeaud, A Mensch… - arxiv preprint arxiv …, 2022 - arxiv.org
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

JT Barron, B Mildenhall, D Verbin… - Proceedings of the …, 2022 - openaccess.thecvf.com
Though neural radiance fields (" NeRF") have demonstrated impressive view synthesis
results on objects and small bounded regions of space, they struggle on" unbounded" …

Photonic matrix multiplication lights up photonic accelerator and beyond

H Zhou, J Dong, J Cheng, W Dong, C Huang… - Light: Science & …, 2022 - nature.com
Matrix computation, as a fundamental building block of information processing in science
and technology, contributes most of the computational overheads in modern signal …