Google Académico

V Borisov, T Leemann, K Seßler, J Haug… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Heterogeneous tabular data are the most commonly used form of data and are essential for
numerous critical and computationally demanding applications. On homogeneous datasets …

Guardar Citar Citado por 884 Artículos relacionados Las 7 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An attentive survey of attention models

S Chaudhari, V Mithal, G Polatkan… - ACM Transactions on …, 2021 - dl.acm.org

Attention Model has now become an important concept in neural networks that has been
researched within diverse application domains. This survey provides a structured and …

Guardar Citar Citado por 904 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Surface form competition: Why the highest probability answer isn't always right

A Holtzman, P West, V Shwartz, Y Choi… - arxiv preprint arxiv …, 2021 - arxiv.org

Large language models have shown promising results in zero-shot settings (Brown et al.,
2020; Radford et al., 2019). For example, they can perform multiple choice tasks simply by …

Guardar Citar Citado por 222 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Guardar Citar Citado por 110 Artículos relacionados Las 10 versiones

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Adversarial sparse transformer for time series forecasting

S Wu, X **ao, Q Ding, P Zhao… - Advances in neural …, 2020 - proceedings.neurips.cc

Many approaches have been proposed for time series forecasting, in light of its significance
in wide applications including business demand prediction. However, the existing methods …

Guardar Citar Citado por 265 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

scTab: scaling cross-tissue single-cell annotation models

F Fischer, DS Fischer, R Mukhin, A Isaev… - Nature …, 2024 - nature.com

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine
learning has been leveraged to automate cell annotation predictions for some time, there …

Guardar Citar Citado por 11 Artículos relacionados Las 8 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural oblivious decision ensembles for deep learning on tabular data

S Popov, S Morozov, A Babenko - arxiv preprint arxiv:1909.06312, 2019 - arxiv.org

Nowadays, deep neural networks (DNNs) have become the main instrument for machine
learning tasks within a wide range of domains, including vision, NLP, and speech …

Guardar Citar Citado por 360 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on green deep learning

J Xu, W Zhou, Z Fu, H Zhou, L Li - arxiv preprint arxiv:2111.05193, 2021 - arxiv.org

In recent years, larger and deeper models are springing up and continuously pushing state-
of-the-art (SOTA) results across various fields like natural language processing (NLP) and …

Guardar Citar Citado por 114 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adaptively sparse transformers

GM Correia, V Niculae, AFT Martins - arxiv preprint arxiv:1909.00015, 2019 - arxiv.org

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the
Transformer, learn powerful context-aware word representations through layered, multi …

Guardar Citar Citado por 284 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning

H Hazimeh, Z Zhao, A Chowdhery… - Advances in …, 2021 - proceedings.neurips.cc

Abstract The Mixture-of-Experts (MoE) architecture is showing promising results in improving
parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks …

Guardar Citar Citado por 141 Artículos relacionados Las 10 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Sparse sequence-to-sequence models

Deep neural networks and tabular data: A survey

An attentive survey of attention models

Surface form competition: Why the highest probability answer isn't always right

Efficient methods for natural language processing: A survey

Adversarial sparse transformer for time series forecasting

scTab: scaling cross-tissue single-cell annotation models

Neural oblivious decision ensembles for deep learning on tabular data

A survey on green deep learning

Adaptively sparse transformers

Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning