Understanding inverse document frequency: on theoretical arguments for IDF
S Robertson - Journal of documentation, 2004 - emerald.com
The term‐weighting function known as IDF was proposed in 1972, and has since been
extremely widely used, usually as part of a TF* IDF function. It is often described as a …
extremely widely used, usually as part of a TF* IDF function. It is often described as a …
An information-theoretic perspective of tf–idf measures
A Aizawa - Information Processing & Management, 2003 - Elsevier
This paper presents a mathematical definition of the “probability-weighted amount of
information”(PWI), a measure of specificity of terms in documents that is based on an …
information”(PWI), a measure of specificity of terms in documents that is based on an …
Kairos: Practical intrusion detection and investigation using whole-system provenance
Provenance graphs are structured audit logs that describe the history of a system's
execution. Recent studies have explored a variety of techniques to analyze provenance …
execution. Recent studies have explored a variety of techniques to analyze provenance …
Computing semantic similarity of concepts in knowledge graphs
G Zhu, CA Iglesias - IEEE Transactions on Knowledge and Data …, 2016 - ieeexplore.ieee.org
This paper presents a method for measuring the semantic similarity between concepts in
Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic …
Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic …
Dispersions and adjusted frequencies in corpora
ST Gries - International journal of corpus linguistics, 2008 - jbe-platform.com
The most frequent statistics in corpus linguistics are frequencies of occurrence and
frequencies of co-occurrence of two or more linguistic variables. However, such frequencies …
frequencies of co-occurrence of two or more linguistic variables. However, such frequencies …
Adding semantics to microblog posts
Microblogs have become an important source of information for the purpose of marketing,
intelligence, and reputation management. Streams of microblogs are of great value because …
intelligence, and reputation management. Streams of microblogs are of great value because …
[BOOK][B] An introduction to search engines and web navigation
M Levene - 2011 - books.google.com
This book is a second edition, updated and expanded to explain the technologies that help
us find information on the web. Search engines and web navigation tools have become …
us find information on the web. Search engines and web navigation tools have become …
Hot topic detection based on a refined TF-IDF algorithm
Z Zhu, J Liang, D Li, H Yu, G Liu - IEEE access, 2019 - ieeexplore.ieee.org
In this paper, we propose a refined term frequency inversed document frequency (TF-IDF)
algorithm called TA TF-IDF to find hot terms, based on time distribution information and user …
algorithm called TA TF-IDF to find hot terms, based on time distribution information and user …
Frequency in lexical processing
Background: Frequency of occurrence is a strong predictor of lexical processing across
modalities and experimental paradigms. However, frequency is part of a large set of …
modalities and experimental paradigms. However, frequency is part of a large set of …
Geoscience keyphrase extraction algorithm using enhanced word embedding
A large amount of unstructured textual data about geoscience structures and minerals is
buried in geoscience documents and is unused. Automatic keyphrase extraction provides …
buried in geoscience documents and is unused. Automatic keyphrase extraction provides …