Google Академик

S **a, Y Zu, X Yang, X Geng - Advances in Neural …, 2025 - proceedings.neurips.cc

In practical scenarios, it is necessary to build variable-sized models to accommodate diverse
resource constraints, where weight initialization serves as a crucial step preceding training …

Сачувај Цитирај Сродни чланци HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Superposed decoding: Multiple generations from a single autoregressive inference pass

E Shen, A Fan, SM Pratt, JS Park, M Wallingford… - arxiv preprint arxiv …, 2024 - arxiv.org

Many applications today provide users with multiple auto-complete drafts as they type,
including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto …

Сачувај Цитирај 3 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] ecva.net

Neural Metamorphosis

X Yang, X Wang - European Conference on Computer Vision, 2024 - Springer

This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta),
which aims to build self-morphable neural networks. Contrary to crafting separate models for …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient stagewise pretraining via progressive subnetworks

A Panigrahi, N Saunshi, K Lyu, S Miryoosefi… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent developments in large language models have sparked interest in efficient
pretraining methods. Stagewise training approaches to improve efficiency, like gradual …

Сачувај Цитирај 6 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Progressive ensemble distillation: Building ensembles for efficient inference

D Dennis, A Shetty, AP Sevekari… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation is commonly used to compress an ensemble of models into a
single model. In this work we study the problem of progressive ensemble distillation: Given a …

Сачувај Цитирај 3 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Starbucks: Improved Training for 2D Matryoshka Embeddings

S Zhuang, S Wang, B Koopman, G Zuccon - arxiv preprint arxiv …, 2024 - arxiv.org

Effective approaches that can scale embedding model depth (ie layers) and embedding size
allow for the creation of models that are highly scalable across different computational …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2) HTML верзија

MatMamba: A Matryoshka State Space Model

A Shukla, S Vemprala, A Kusupati, A Kapoor - arxiv preprint arxiv …, 2024 - arxiv.org

State Space Models (SSMs) like Mamba2 are a promising alternative to Transformers, with
faster theoretical training and inference times--especially for long context lengths. Recent …

Сачувај Цитирај 1 пута наведен Сродни чланци Све верзије (2) Кеширано

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Adanns: A framework for adaptive semantic search

A Rege, A Kusupati, A Fan, Q Cao… - Advances in …, 2023 - proceedings.neurips.cc

Web-scale search systems learn an encoder to embed a given query which is then hooked
into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points …

Сачувај Цитирај 4 пута наведен Сродни чланци Све верзије (12) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

When One LLM Drools, Multi-LLM Collaboration Rules

S Feng, W Ding, A Liu, Z Wang, W Shi, Y Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

This position paper argues that in many realistic (ie, complex, contextualized, subjective)
scenarios, one LLM is not enough to produce a reliable output. We challenge the status quo …

Сачувај Цитирај Сродни чланци HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs

K Nishu, S Mehta, S Abnar, M Farajtabar… - arxiv preprint arxiv …, 2025 - arxiv.org

Training large language models (LLMs) for different inference constraints is computationally
expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these …

Сачувај Цитирај Сродни чланци Све верзије (2) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Matformer: Nested transformer for elastic inference

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Superposed decoding: Multiple generations from a single autoregressive inference pass

Neural Metamorphosis

Efficient stagewise pretraining via progressive subnetworks

Progressive ensemble distillation: Building ensembles for efficient inference

Starbucks: Improved Training for 2D Matryoshka Embeddings

MatMamba: A Matryoshka State Space Model

Adanns: A framework for adaptive semantic search

When One LLM Drools, Multi-LLM Collaboration Rules

From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs