[PDF][PDF] Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Position information in transformers: An overview

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu
Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

DKPLM: decomposable knowledge-enhanced pre-trained language model for natural language understanding

T Zhang, C Wang, N Hu, M Qiu, C Tang, X He… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained
models with relation triples injecting from knowledge graphs to improve language …

Monotonic location attention for length generalization

JR Chowdhury, C Caragea - International Conference on …, 2023 - proceedings.mlr.press
We explore different ways to utilize position-based cross-attention in seq2seq networks to
enable length generalization in algorithmic tasks. We show that a simple approach of …

Revisiting and advancing chinese natural language understanding with accelerated heterogeneous knowledge pre-training

T Zhang, J Dong, J Wang, C Wang, A Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Recently, knowledge-enhanced pre-trained language models (KEPLMs) improve context-
aware representations via learning from structured relations in knowledge graphs, and/or …

SeqNet: An efficient neural network for automatic malware detection

J Xu, W Fu, H Bu, Z Wang, L Ying - arxiv preprint arxiv:2205.03850, 2022 - arxiv.org
Malware continues to evolve rapidly, and more than 450,000 new samples are captured
every day, which makes manual malware analysis impractical. However, existing deep …

Word order matters when you increase masking

K Lasri, A Lenci, T Poibeau - arxiv preprint arxiv:2211.04427, 2022 - arxiv.org
Word order, an essential property of natural languages, is injected in Transformer-based
neural language models using position encoding. However, recent experiments have shown …

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

J Yan, C Wang, T Zhang, X He, J Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
KEPLMs are pre-trained models that utilize external knowledge to enhance language
understanding. Previous language models facilitated knowledge acquisition by …

Bridging the gap between position-based and content-based self-attention for neural machine translation

F Schmidt, MA Di Gangi - … of the Eighth Conference on Machine …, 2023 - aclanthology.org
Position-based token-mixing approaches, such as FNet and MLPMixer, have shown to be
exciting attention alternatives for computer vision and natural language understanding. The …

Capturing natural position relationships: A neural differential equation approach

C Ji, L Wang, J Qin, X Kang, Z Wang - Pattern Recognition Letters, 2024 - Elsevier
The Transformer has emerged as the predominant model in Natural Language Processing
due to its exceptional performance in various sequence modeling tasks, particularly in …