Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model

S Smith, M Patwary, B Norick, P LeGresley… - ar** through multi-source data and transformer
Y Chen, Q Yan - International Journal of Applied Earth Observation and …, 2024 - Elsevier
Abstract Cyclone Global Navigation Satellite System (CyGNSS) data are widely recognized
for their sensitivity to inland water bodies. However, the detection of water bodies using …

A large language model–based generative natural language processing framework fine‐tuned on clinical notes accurately extracts headache frequency from …

CC Chiang, M Luo, G Dumkrieger… - … : The Journal of …, 2024 - Wiley Online Library
Objective To develop a natural language processing (NLP) algorithm that can accurately
extract headache frequency from free‐text clinical notes. Background Headache frequency …

ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model

J Zhang, H Lu, Y Jiang, Y Ma… - Journal of Chemical …, 2024 - ACS Publications
Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in
various biological processes, including gene expression regulation, epigenetic regulation …

Understanding and scheduling weight decay

Z **e, I Sato, M Sugiyama - 2020 - openreview.net
Weight decay is a popular and even necessary regularization technique for training deep
neural networks that generalize well. Previous work usually interpreted weight decay as a …

TPN: Transferable Proto-Learning Network towards Few-shot Document-Level Relation Extraction

Y Zhang, Z Kang - 2024 International Joint Conference on …, 2024 - ieeexplore.ieee.org
Few-shot document-level relation extraction suffers from poor performance due to the
challenging cross-domain transferability of NOTA (none-of-the-above) relation …