Information retrieval and text mining technologies for chemistry

M Krallinger, O Rabal, A Lourenco, J Oyarzabal… - Chemical …, 2017 - ACS Publications
Efficient access to chemical information contained in scientific literature, patents, technical
reports, or the web is a pressing need shared by researchers and patent attorneys from …

A survey on the state-of-the-art machine learning models in the context of NLP

W Khan, A Daud, JA Nasir, T Amjad - Kuwait journal of Science, 2016 - journalskuwait.org
KJS inside pages October 2016.indd Page 1 Kuwait J. Sci. 43 (4) pp. 95-113, 2016 A survey on
the state-of-the-art machine learning models in the context of NLP Wahab Khan1,*, Ali Daud2,1 …

Study of various methods for tokenization

A Rai, S Borah - Applications of Internet of Things: Proceedings of …, 2021 - Springer
Tokenization is the mechanism of splitting or fragmenting the sentences and words to its
possible smallest morpheme called as token. Morpheme is smallest possible word after …

Differential gene expression in disease: a comparison between high-throughput studies and the literature

R Rodriguez-Esteban, X Jiang - BMC medical genomics, 2017 - Springer
Background Differential gene expression is important to understand the biological
differences between healthy and diseased states. Two common sources of differential gene …

Complex event extraction at PubMed scale

J Björne, F Ginter, S Pyysalo, J Tsujii… - Bioinformatics, 2010 - academic.oup.com
Motivation: There has recently been a notable shift in biomedical information extraction (IE)
from relation models toward the more expressive event model, facilitated by the maturation …

Analysis of biological processes and diseases using text mining approaches

M Krallinger, F Leitner, A Valencia - Bioinformatics Methods in Clinical …, 2010 - Springer
A number of biomedical text mining systems have been developed to extract biologically
relevant information directly from the literature, complementing bioinformatics methods in the …

[PDF][PDF] Word and sentence tokenization with Hidden Markov Models

B Jurish, KM Würzner - Journal for Language Technology and …, 2013 - jlcl.org
We present a novel method (“waste”) for the segmentation of text into tokens and sentences.
Our approach makes use of a Hidden Markov Model for the detection of segment …

SOAP classifier for free-text clinical notes with domain-specific pre-trained language models

JM de Oliveira, RS Antunes, CA da Costa - Expert Systems with …, 2024 - Elsevier
The increasing use of electronic health records (EHRs) in healthcare has led to a significant
amount of unstructured clinical text data. This paper proposes a model for classifying free …

Tokenizing micro-blogging messages using a text classification approach

G Laboreiro, L Sarmento, J Teixeira… - Proceedings of the fourth …, 2010 - dl.acm.org
The automatic processing of microblogging messages may be problematic, even in the case
of very elementary operations such as tokenization. The problems arise from the use of non …

Elephant: Sequence labeling for word and sentence segmentation

K Evang, V Basile, G Chrupała, J Bos - EMNLP 2013, 2013 - hal.science
Tokenization is widely regarded as a solved problem due to the high accuracy that rule-
based tokenizers achieve. But rule-based tokenizers are hard to maintain and their rules …