[HTML][HTML] Pre-trained models: Past, present and future
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
Data-driven building load prediction and large language models: Comprehensive overview
Y Zhang, D Wang, G Wang, P Xu, Y Zhu - Energy and Buildings, 2024 - Elsevier
Building load forecasting is essential for optimizing the architectural design and managing
energy efficiently, enhancing the performance of Heating, Ventilation, and Air Conditioning …
energy efficiently, enhancing the performance of Heating, Ventilation, and Air Conditioning …
Not all tokens are what you need for pretraining
Previous language model pre-training methods have uniformly applied a next-token
prediction loss to all training tokens. Challenging this norm, we posit that''Not all tokens in a …
prediction loss to all training tokens. Challenging this norm, we posit that''Not all tokens in a …
CLEVE: contrastive pre-training for event extraction
Event extraction (EE) has considerably benefited from pre-trained language models (PLMs)
by fine-tuning. However, existing pre-training methods have not involved modeling event …
by fine-tuning. However, existing pre-training methods have not involved modeling event …
A novel neural network model fusion approach for improving medical named entity recognition in online health expert question-answering services
Z Hu, X Ma - Expert Systems with Applications, 2023 - Elsevier
Because of the frequent occurrence of chronic diseases, the COVID-19 pandemic, etc.,
online health expert question-answering (HQA) services have been unable to cope with the …
online health expert question-answering (HQA) services have been unable to cope with the …
Continual knowledge distillation for neural machine translation
While many parallel corpora are not publicly accessible for data copyright, data privacy and
competitive differentiation reasons, trained translation models are increasingly available on …
competitive differentiation reasons, trained translation models are increasingly available on …
[PDF][PDF] len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning
Binary reverse engineering is an arduous and tedious task performed by skilled and
expensive human analysts. Information about the source code is irrevocably lost in the …
expensive human analysts. Information about the source code is irrevocably lost in the …
EntityBERT: Entity-centric masking strategy for model pretraining for the clinical domain
Transformer-based neural language models have led to breakthroughs for a variety of
natural language processing (NLP) tasks. However, most models are pretrained on general …
natural language processing (NLP) tasks. However, most models are pretrained on general …
Teaching the pre-trained model to generate simple texts for text simplification
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows
models to acquire the ability to generate simple texts. It can hurt the performance of pre …
models to acquire the ability to generate simple texts. It can hurt the performance of pre …
MoCA: Incorporating domain pretraining and cross attention for textbook question answering
Abstract Textbook Question Answering (TQA) is a complex multimodal task to infer answers
given large context descriptions and abundant diagrams. Compared with Visual Question …
given large context descriptions and abundant diagrams. Compared with Visual Question …