Google znalac

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu

Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Spremi Citiraj Spominje se 107 puta Srodni članci Svih 9 inačica

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Position information in transformers: An overview

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu

Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

Spremi Citiraj Spominje se 198 puta Srodni članci Svih 11 inačica

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

DKPLM: decomposable knowledge-enhanced pre-trained language model for natural language understanding

T Zhang, C Wang, N Hu, M Qiu, C Tang, X He… - Proceedings of the …, 2022 - ojs.aaai.org

Abstract Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained
models with relation triples injecting from knowledge graphs to improve language …

Spremi Citiraj Spominje se 49 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Monotonic location attention for length generalization

JR Chowdhury, C Caragea - International Conference on …, 2023 - proceedings.mlr.press

We explore different ways to utilize position-based cross-attention in seq2seq networks to
enable length generalization in algorithmic tasks. We show that a simple approach of …

Spremi Citiraj Spominje se 8 puta Srodni članci Svih 7 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revisiting and advancing chinese natural language understanding with accelerated heterogeneous knowledge pre-training

T Zhang, J Dong, J Wang, C Wang, A Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Recently, knowledge-enhanced pre-trained language models (KEPLMs) improve context-
aware representations via learning from structured relations in knowledge graphs, and/or …

Spremi Citiraj Spominje se 9 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SeqNet: An efficient neural network for automatic malware detection

J Xu, W Fu, H Bu, Z Wang, L Ying - arxiv preprint arxiv:2205.03850, 2022 - arxiv.org

Malware continues to evolve rapidly, and more than 450,000 new samples are captured
every day, which makes manual malware analysis impractical. However, existing deep …

Spremi Citiraj Spominje se 7 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Word order matters when you increase masking

K Lasri, A Lenci, T Poibeau - arxiv preprint arxiv:2211.04427, 2022 - arxiv.org

Word order, an essential property of natural languages, is injected in Transformer-based
neural language models using position encoding. However, recent experiments have shown …

Spremi Citiraj Spominje se 5 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

J Yan, C Wang, T Zhang, X He, J Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

KEPLMs are pre-trained models that utilize external knowledge to enhance language
understanding. Previous language models facilitated knowledge acquisition by …

Spremi Citiraj Srodni članci Svih 5 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Bridging the gap between position-based and content-based self-attention for neural machine translation

F Schmidt, MA Di Gangi - … of the Eighth Conference on Machine …, 2023 - aclanthology.org

Position-based token-mixing approaches, such as FNet and MLPMixer, have shown to be
exciting attention alternatives for computer vision and natural language understanding. The …

Spremi Citiraj Spominje se 1 puta Srodni članci Svih 3 inačica Prikaži kao HTML

Capturing natural position relationships: A neural differential equation approach

C Ji, L Wang, J Qin, X Kang, Z Wang - Pattern Recognition Letters, 2024 - Elsevier

The Transformer has emerged as the predominant model in Natural Language Processing
due to its exceptional performance in various sequence modeling tasks, particularly in …

Spremi Citiraj Srodni članci Svih 3 inačica

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Convolutions and self-attention: Re-interpreting relative positions in pre-trained language models

[PDF][PDF] Language model behavior: A comprehensive survey

Position information in transformers: An overview

DKPLM: decomposable knowledge-enhanced pre-trained language model for natural language understanding

Monotonic location attention for length generalization

Revisiting and advancing chinese natural language understanding with accelerated heterogeneous knowledge pre-training

SeqNet: An efficient neural network for automatic malware detection

Word order matters when you increase masking

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

Bridging the gap between position-based and content-based self-attention for neural machine translation

Capturing natural position relationships: A neural differential equation approach