Be like a goldfish, don't memorize! mitigating memorization in generative llms

A Hans, J Kirchenbauer, Y Wen… - Advances in …, 2025 - proceedings.neurips.cc
Large language models can memorize and repeat their training data, causing privacy and
copyright risks. To mitigate memorization, we introduce a subtle modification to the next …

Loki: Low-rank keys for efficient sparse attention

P Singhania, S Singh, S He, S Feizi… - Advances in Neural …, 2025 - proceedings.neurips.cc
Inference on large language models (LLMs) can be expensive in terms of thecompute and
memory costs involved, especially when long sequence lengths areused. In particular, the …

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training

S Singh, O Ruwase, AA Awan, S Rajbhandari… - Proceedings of the 37th …, 2023 - dl.acm.org
Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert
blocks to a base model, increasing the number of parameters without impacting …

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

J Gei**, S McLeish, N Jain, J Kirchenbauer… - arxiv preprint arxiv …, 2025 - arxiv.org
We study a novel language model architecture that is capable of scaling test-time
computation by implicitly reasoning in latent space. Our model works by iterating a recurrent …

A survey and empirical evaluation of parallel deep learning frameworks

D Nichols, S Singh, SH Lin, A Bhatele - arxiv preprint arxiv:2111.04949, 2021 - arxiv.org
The field of deep learning has witnessed a remarkable shift towards extremely compute-and
memory-intensive neural networks. These newer larger models have enabled researchers …