Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q **e, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023‏ - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Data-driven materials research enabled by natural language processing and information extraction

EA Olivetti, JM Cole, E Kim, O Kononova… - Applied Physics …, 2020‏ - pubs.aip.org
Given the emergence of data science and machine learning throughout all aspects of
society, but particularly in the scientific domain, there is increased importance placed on …

ScispaCy: fast and robust models for biomedical natural language processing

M Neumann, D King, I Beltagy, W Ammar - arxiv preprint arxiv …, 2019‏ - arxiv.org
Despite recent advances in natural language processing, many statistical models for
processing text perform extremely poorly under domain shift. Processing biomedical and …

Construction of the literature graph in semantic scholar

W Ammar, D Groeneveld, C Bhagavatula… - arxiv preprint arxiv …, 2018‏ - arxiv.org
We describe a deployed scalable system for organizing published scientific literature into a
heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting …

Autonomous discovery in the chemical sciences part I: Progress

CW Coley, NS Eyke, KF Jensen - … Chemie International Edition, 2020‏ - Wiley Online Library
This two‐part Review examines how automation has contributed to different aspects of
discovery in the chemical sciences. In this first part, we describe a classification for …

ASRNN: A recurrent neural network with an attention model for sequence labeling

JCW Lin, Y Shao, Y Djenouri, U Yun - Knowledge-Based Systems, 2021‏ - Elsevier
Natural language processing (NLP) is useful for handling text and speech, and sequence
labeling plays an important role by automatically analyzing a sequence (text) to assign …

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition

L Luo, Z Yang, P Yang, Y Zhang, L Wang, H Lin… - …, 2018‏ - academic.oup.com
Motivation In biomedical research, chemical is an important class of entities, and chemical
named entity recognition (NER) is an important task in the field of biomedical information …

ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature

MC Swain, JM Cole - Journal of chemical information and …, 2016‏ - ACS Publications
The emergence of “big data” initiatives has led to the need for tools that can automatically
extract valuable chemical information from large volumes of unstructured data, such as the …

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks

L Luo, J Ning, Y Zhao, Z Wang, Z Ding… - Journal of the …, 2024‏ - academic.oup.com
Objective Most existing fine-tuned biomedical large language models (LLMs) focus on
enhancing performance in monolingual biomedical question answering and conversation …

Autonomous discovery in the chemical sciences part II: outlook

CW Coley, NS Eyke, KF Jensen - … Chemie International Edition, 2020‏ - Wiley Online Library
This two‐part Review examines how automation has contributed to different aspects of
discovery in the chemical sciences. In this second part, we reflect on a selection of …