Understanding inverse document frequency: on theoretical arguments for IDF

S Robertson - Journal of documentation, 2004 - emerald.com
The term‐weighting function known as IDF was proposed in 1972, and has since been
extremely widely used, usually as part of a TF* IDF function. It is often described as a …

An information-theoretic perspective of tf–idf measures

A Aizawa - Information Processing & Management, 2003 - Elsevier
This paper presents a mathematical definition of the “probability-weighted amount of
information”(PWI), a measure of specificity of terms in documents that is based on an …

Kairos: Practical intrusion detection and investigation using whole-system provenance

Z Cheng, Q Lv, J Liang, Y Wang, D Sun… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org
Provenance graphs are structured audit logs that describe the history of a system's
execution. Recent studies have explored a variety of techniques to analyze provenance …

Computing semantic similarity of concepts in knowledge graphs

G Zhu, CA Iglesias - IEEE Transactions on Knowledge and Data …, 2016 - ieeexplore.ieee.org
This paper presents a method for measuring the semantic similarity between concepts in
Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic …

Dispersions and adjusted frequencies in corpora

ST Gries - International journal of corpus linguistics, 2008 - jbe-platform.com
The most frequent statistics in corpus linguistics are frequencies of occurrence and
frequencies of co-occurrence of two or more linguistic variables. However, such frequencies …

Adding semantics to microblog posts

E Meij, W Weerkamp, M De Rijke - … conference on Web search and data …, 2012 - dl.acm.org
Microblogs have become an important source of information for the purpose of marketing,
intelligence, and reputation management. Streams of microblogs are of great value because …

[BOOK][B] An introduction to search engines and web navigation

M Levene - 2011 - books.google.com
This book is a second edition, updated and expanded to explain the technologies that help
us find information on the web. Search engines and web navigation tools have become …

Hot topic detection based on a refined TF-IDF algorithm

Z Zhu, J Liang, D Li, H Yu, G Liu - IEEE access, 2019 - ieeexplore.ieee.org
In this paper, we propose a refined term frequency inversed document frequency (TF-IDF)
algorithm called TA TF-IDF to find hot terms, based on time distribution information and user …

Frequency in lexical processing

RH Baayen, P Milin, M Ramscar - Aphasiology, 2016 - Taylor & Francis
Background: Frequency of occurrence is a strong predictor of lexical processing across
modalities and experimental paradigms. However, frequency is part of a large set of …

Geoscience keyphrase extraction algorithm using enhanced word embedding

Q Qiu, Z **e, L Wu, W Li - Expert Systems with Applications, 2019 - Elsevier
A large amount of unstructured textual data about geoscience structures and minerals is
buried in geoscience documents and is unused. Automatic keyphrase extraction provides …