Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers
The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …
Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …
internal representations harnessed by neural networks and Transformers. Building on recent …
An improved sample complexity for rank-1 matrix sensing
Matrix sensing is a problem in signal processing and machine learning that involves
recovering a low-rank matrix from a set of linear measurements. The goal is to reconstruct …
recovering a low-rank matrix from a set of linear measurements. The goal is to reconstruct …
Quantum phase estimation by compressed sensing
As a signal recovery algorithm, compressed sensing is particularly effective when the data
has low complexity and samples are scarce, which aligns natually with the task of quantum …
has low complexity and samples are scarce, which aligns natually with the task of quantum …
The ESPRIT algorithm under high noise: Optimal error scaling and noisy super-resolution
Subspace-based signal processing techniques, such as the Estimation of Signal Parameters
via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral …
via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral …
Federated Empirical Risk Minimization via Second-Order Method
Many convex optimization problems with important applications in machine learning are
formulated as empirical risk minimization (ERM). There are several examples: linear and …
formulated as empirical risk minimization (ERM). There are several examples: linear and …
Query complexity of active learning for function family with nearly orthogonal basis
Many machine learning algorithms require large numbers of labeled data to deliver state-of-
the-art results. In applications such as medical diagnosis and fraud detection, though there …
the-art results. In applications such as medical diagnosis and fraud detection, though there …
Efficient asynchronize stochastic gradient algorithm with structured data
Z Song, M Ye - arxiv preprint arxiv:2305.08001, 2023 - arxiv.org
Deep learning has achieved impressive success in a variety of fields because of its good
generalization. However, it has been a challenging problem to quickly train a neural network …
generalization. However, it has been a challenging problem to quickly train a neural network …
Super-resolution and robust sparse continuous fourier transform in any constant dimension: Nearly linear time and sample complexity
The ability to resolve detail in the object that is being imaged, named by resolution, is the
core parameter of an imaging system. Super-resolution is a class of techniques that can …
core parameter of an imaging system. Super-resolution is a class of techniques that can …
Metric Transforms and Low Rank Representations of Kernels for Fast Attention
We introduce a new linear-algebraic tool based on group representation theory, and use it to
address three key problems in machine learning. 1. Past researchers have proposed fast …
address three key problems in machine learning. 1. Past researchers have proposed fast …