Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

T Patel, F Lu, E Raff, C Nicholas, C Matuszek… - arxiv preprint arxiv …, 2023 - arxiv.org
Industry practitioners care about small improvements in malware detection accuracy
because their models are deployed to hundreds of millions of machines, meaning a 0.1 …

Endpoint detection and response: Why use machine learning?

NNA Sjarif, S Chuprat, MN Mahrin… - … on Information and …, 2019 - ieeexplore.ieee.org
Threats towards the cyberspace have becoming more aggressive, intelligent and some
attack at real-time. These urged both researchers and practitioner to secure the cyberspace …

Hac-t and fast search for similarity in security

J Oliver, M Ali, J Hagen - 2020 International Conference on …, 2020 - ieeexplore.ieee.org
Similarity digests have gained popularity for many security applications like blacklisting/
whitelisting, and finding similar variants of malware. TLSH has been shown to be particularly …

Scalable malware clustering using multi-stage tree parallelization

M Ali, J Hagen, J Oliver - 2020 IEEE International Conference …, 2020 - ieeexplore.ieee.org
Similarity hashing is an important tool for searching and analyzing malware samples which
are similar to known malware samples. Several similarity hashing schemes exist in the …

[PDF][PDF] Efficient and accurate non-metric k-NN search with applications to text matching

L Boytsov - 2018 - lti.cmu.edu
In this thesis we advance state-of-the-art of the non-metric k-NN search by carrying out an
extensive empirical evaluation (both and intrinsic) of generic methods for k-NN search. This …

ClarAVy: A Tool for Scalable and Accurate Malware Family Labeling

RJ Joyce, D Everett, M Fuchs, E Raff, J Holt - arxiv preprint arxiv …, 2025 - arxiv.org
Determining the family to which a malicious file belongs is an essential component of
cyberattack investigation, attribution, and remediation. Performing this task manually is time …

Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search

J Oliver, R Batta, A Bates, MA Inam, S Mehta… - arxiv preprint arxiv …, 2024 - arxiv.org
" Alert fatigue" is one of the biggest challenges faced by the Security Operations Center
(SOC) today, with analysts spending more than half of their time reviewing false alerts …

Evaluation of files for cyber threats using a machine learning model

JJ Oliver, CY Chang, WK Tsao, LH Hsu - US Patent 11,182,481, 2021 - Google Patents
(57) ABSTRACT A system for evaluating files for cyber threats includes a machine learning
model and a locality sensitive hash (LSH) repository. When the machine learning model …

Generation of file digests for cybersecurity applications

CM Chiang, PH Hao, KC Wang - US Patent 11,068,595, 2021 - Google Patents
(57) ABSTRACT A cybersecurity server receives an executable file. The executable file is
disassembled to generate assembly code of the executable file. High-entropy blocks and …

Evaluation of files for cybersecurity threats using global and local file information

CY Chang, WK Tsao - US Patent 11,151,250, 2021 - Google Patents
(57) ABSTRACT A global locality sensitive hash (LSH) database stores global locality
sensitive hashes of files of different private com puter networks. Each of the private computer …