Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical
reports, or the web is a pressing need shared by researchers and patent attorneys from …
reports, or the web is a pressing need shared by researchers and patent attorneys from …
A survey on the state-of-the-art machine learning models in the context of NLP
KJS inside pages October 2016.indd Page 1 Kuwait J. Sci. 43 (4) pp. 95-113, 2016 A survey on
the state-of-the-art machine learning models in the context of NLP Wahab Khan1,*, Ali Daud2,1 …
the state-of-the-art machine learning models in the context of NLP Wahab Khan1,*, Ali Daud2,1 …
Study of various methods for tokenization
A Rai, S Borah - Applications of Internet of Things: Proceedings of …, 2021 - Springer
Tokenization is the mechanism of splitting or fragmenting the sentences and words to its
possible smallest morpheme called as token. Morpheme is smallest possible word after …
possible smallest morpheme called as token. Morpheme is smallest possible word after …
Differential gene expression in disease: a comparison between high-throughput studies and the literature
R Rodriguez-Esteban, X Jiang - BMC medical genomics, 2017 - Springer
Background Differential gene expression is important to understand the biological
differences between healthy and diseased states. Two common sources of differential gene …
differences between healthy and diseased states. Two common sources of differential gene …
Complex event extraction at PubMed scale
Motivation: There has recently been a notable shift in biomedical information extraction (IE)
from relation models toward the more expressive event model, facilitated by the maturation …
from relation models toward the more expressive event model, facilitated by the maturation …
Analysis of biological processes and diseases using text mining approaches
A number of biomedical text mining systems have been developed to extract biologically
relevant information directly from the literature, complementing bioinformatics methods in the …
relevant information directly from the literature, complementing bioinformatics methods in the …
[PDF][PDF] Word and sentence tokenization with Hidden Markov Models
We present a novel method (“waste”) for the segmentation of text into tokens and sentences.
Our approach makes use of a Hidden Markov Model for the detection of segment …
Our approach makes use of a Hidden Markov Model for the detection of segment …
SOAP classifier for free-text clinical notes with domain-specific pre-trained language models
The increasing use of electronic health records (EHRs) in healthcare has led to a significant
amount of unstructured clinical text data. This paper proposes a model for classifying free …
amount of unstructured clinical text data. This paper proposes a model for classifying free …
Tokenizing micro-blogging messages using a text classification approach
The automatic processing of microblogging messages may be problematic, even in the case
of very elementary operations such as tokenization. The problems arise from the use of non …
of very elementary operations such as tokenization. The problems arise from the use of non …
Elephant: Sequence labeling for word and sentence segmentation
Tokenization is widely regarded as a solved problem due to the high accuracy that rule-
based tokenizers achieve. But rule-based tokenizers are hard to maintain and their rules …
based tokenizers achieve. But rule-based tokenizers are hard to maintain and their rules …