Google Наука

A Hans, J Kirchenbauer, Y Wen… - Advances in …, 2025 - proceedings.neurips.cc

Large language models can memorize and repeat their training data, causing privacy and
copyright risks. To mitigate memorization, we introduce a subtle modification to the next …

Запазване Позоваване С позовавания в 16 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Loki: Low-rank keys for efficient sparse attention

P Singhania, S Singh, S He, S Feizi… - Advances in Neural …, 2025 - proceedings.neurips.cc

Inference on large language models (LLMs) can be expensive in terms of thecompute and
memory costs involved, especially when long sequence lengths areused. In particular, the …

Запазване Позоваване С позовавания в 14 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training

S Singh, O Ruwase, AA Awan, S Rajbhandari… - Proceedings of the 37th …, 2023 - dl.acm.org

Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert
blocks to a base model, increasing the number of parameters without impacting …

Запазване Позоваване С позовавания в 21 Сродни статии Всички 10 версии

Mpmoe: Memory efficient moe for pre-trained models with adaptive pipeline parallelism

Z Zhang, Y ** high-quality
intelligent systems, including defending systems against cyberattacks and guaranteeing …

Запазване Позоваване С позовавания в 4 Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

J Gei**, S McLeish, N Jain, J Kirchenbauer… - arxiv preprint arxiv …, 2025 - arxiv.org

We study a novel language model architecture that is capable of scaling test-time
computation by implicitly reasoning in latent space. Our model works by iterating a recurrent …

Запазване Позоваване С позовавания в 3 Сродни статии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey and empirical evaluation of parallel deep learning frameworks

D Nichols, S Singh, SH Lin, A Bhatele - arxiv preprint arxiv:2111.04949, 2021 - arxiv.org

The field of deep learning has witnessed a remarkable shift towards extremely compute-and
memory-intensive neural networks. These newer larger models have enabled researchers …

Запазване Позоваване С позовавания в 8 Сродни статии Всички 3 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

Be like a goldfish, don't memorize! mitigating memorization in generative llms

Loki: Low-rank keys for efficient sparse attention

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training

Mpmoe: Memory efficient moe for pre-trained models with adaptive pipeline parallelism

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

A survey and empirical evaluation of parallel deep learning frameworks