[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Data-driven building load prediction and large language models: Comprehensive overview

Y Zhang, D Wang, G Wang, P Xu, Y Zhu - Energy and Buildings, 2024 - Elsevier
Building load forecasting is essential for optimizing the architectural design and managing
energy efficiently, enhancing the performance of Heating, Ventilation, and Air Conditioning …

Not all tokens are what you need for pretraining

Z Lin, Z Gou, Y Gong, X Liu, R Xu… - Advances in …, 2025 - proceedings.neurips.cc
Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that''Not all tokens in a …

CLEVE: contrastive pre-training for event extraction

Z Wang, X Wang, X Han, Y Lin, L Hou, Z Liu… - arxiv preprint arxiv …, 2021 - arxiv.org
Event extraction (EE) has considerably benefited from pre-trained language models (PLMs)
by fine-tuning. However, existing pre-training methods have not involved modeling event …

A novel neural network model fusion approach for improving medical named entity recognition in online health expert question-answering services

Z Hu, X Ma - Expert Systems with Applications, 2023 - Elsevier
Because of the frequent occurrence of chronic diseases, the COVID-19 pandemic, etc.,
online health expert question-answering (HQA) services have been unable to cope with the …

Continual knowledge distillation for neural machine translation

Y Zhang, P Li, M Sun, Y Liu - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
While many parallel corpora are not publicly accessible for data copyright, data privacy and
competitive differentiation reasons, trained translation models are increasingly available on …

[PDF][PDF] len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning

KK Pal, AP Bajaj, P Banerjee, A Dutcher… - 2024 IEEE Symposium …, 2024 - yancomm.net
Binary reverse engineering is an arduous and tedious task performed by skilled and
expensive human analysts. Information about the source code is irrevocably lost in the …

EntityBERT: Entity-centric masking strategy for model pretraining for the clinical domain

C Lin, T Miller, D Dligach, S Bethard, G Savova - 2021 - repository.arizona.edu
Transformer-based neural language models have led to breakthroughs for a variety of
natural language processing (NLP) tasks. However, most models are pretrained on general …

Teaching the pre-trained model to generate simple texts for text simplification

R Sun, W Xu, X Wan - arxiv preprint arxiv:2305.12463, 2023 - arxiv.org
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows
models to acquire the ability to generate simple texts. It can hurt the performance of pre …

MoCA: Incorporating domain pretraining and cross attention for textbook question answering

F Xu, Q Lin, J Liu, L Zhang, T Zhao, Q Chai, Y Pan… - Pattern Recognition, 2023 - Elsevier
Abstract Textbook Question Answering (TQA) is a complex multimodal task to infer answers
given large context descriptions and abundant diagrams. Compared with Visual Question …