Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Y Liang, H Liu, Z Shi, Z Song, Z Xu, J Yin - arxiv preprint arxiv:2405.05219, 2024 - arxiv.org
The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arxiv preprint arxiv …, 2024 - openreview.net
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

An improved sample complexity for rank-1 matrix sensing

Y Deng, Z Li, Z Song - arxiv preprint arxiv:2303.06895, 2023 - arxiv.org
Matrix sensing is a problem in signal processing and machine learning that involves
recovering a low-rank matrix from a set of linear measurements. The goal is to reconstruct …

Quantum phase estimation by compressed sensing

C Yi, C Zhou, J Takahashi - Quantum, 2024 - quantum-journal.org
As a signal recovery algorithm, compressed sensing is particularly effective when the data
has low complexity and samples are scarce, which aligns natually with the task of quantum …

The ESPRIT algorithm under high noise: Optimal error scaling and noisy super-resolution

Z Ding, EN Epperly, L Lin, R Zhang - arxiv preprint arxiv:2404.03885, 2024 - arxiv.org
Subspace-based signal processing techniques, such as the Estimation of Signal Parameters
via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral …

Federated Empirical Risk Minimization via Second-Order Method

S Bian, Z Song, J Yin - arxiv preprint arxiv:2305.17482, 2023 - arxiv.org
Many convex optimization problems with important applications in machine learning are
formulated as empirical risk minimization (ERM). There are several examples: linear and …

Query complexity of active learning for function family with nearly orthogonal basis

X Chen, Z Song, B Sun, J Yin, D Zhuo - arxiv preprint arxiv:2306.03356, 2023 - arxiv.org
Many machine learning algorithms require large numbers of labeled data to deliver state-of-
the-art results. In applications such as medical diagnosis and fraud detection, though there …

Efficient asynchronize stochastic gradient algorithm with structured data

Z Song, M Ye - arxiv preprint arxiv:2305.08001, 2023 - arxiv.org
Deep learning has achieved impressive success in a variety of fields because of its good
generalization. However, it has been a challenging problem to quickly train a neural network …

Super-resolution and robust sparse continuous fourier transform in any constant dimension: Nearly linear time and sample complexity

Y **, D Liu, Z Song - Proceedings of the 2023 Annual ACM-SIAM …, 2023 - SIAM
The ability to resolve detail in the object that is being imaged, named by resolution, is the
core parameter of an imaging system. Super-resolution is a class of techniques that can …

Metric Transforms and Low Rank Representations of Kernels for Fast Attention

TZA Chu, J Alman, G Miller, S Narayanan… - The Thirty-eighth Annual … - openreview.net
We introduce a new linear-algebraic tool based on group representation theory, and use it to
address three key problems in machine learning. 1. Past researchers have proposed fast …