[HTML][HTML] Large language models for human–robot interaction: A review

C Zhang, J Chen, J Li, Y Peng, Z Mao - Biomimetic Intelligence and …, 2023 - Elsevier
The fusion of large language models and robotic systems has introduced a transformative
paradigm in human–robot interaction, offering unparalleled capabilities in natural language …

[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Fast inference from transformers via speculative decoding

Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press
Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org
We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Convit: Improving vision transformers with soft convolutional inductive biases

S d'Ascoli, H Touvron, ML Leavitt… - International …, 2021 - proceedings.mlr.press
Convolutional architectures have proven extremely successful for vision tasks. Their hard
inductive biases enable sample-efficient learning, but come at the cost of a potentially lower …

Perceiver: General perception with iterative attention

A Jaegle, F Gimeno, A Brock… - International …, 2021 - proceedings.mlr.press
Biological systems understand the world by simultaneously processing high-dimensional
inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The …

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org
In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com
While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …