CatBoost for big data: an interdisciplinary review

JT Hancock, TM Khoshgoftaar - Journal of big data, 2020 - Springer
Abstract Gradient Boosted Decision Trees (GBDT's) are a powerful tool for classification and
regression tasks in Big Data. Researchers should be familiar with the strengths and …

A literature review on one-class classification and its potential applications in big data

N Seliya, A Abdollah Zadeh, TM Khoshgoftaar - Journal of Big Data, 2021 - Springer
In severely imbalanced datasets, using traditional binary or multi-class classification typically
leads to bias towards the class (es) with the much larger number of instances. Under such …

Gradient boosting decision tree becomes more reliable than logistic regression in predicting probability for diabetes with big data

H Seto, A Oyama, S Kitora, H Toki, R Yamamoto… - Scientific reports, 2022 - nature.com
We sought to verify the reliability of machine learning (ML) in develo** diabetes prediction
models by utilizing big data. To this end, we compared the reliability of gradient boosting …

Detecting web attacks using random undersampling and ensemble learners

R Zuech, J Hancock, TM Khoshgoftaar - Journal of Big Data, 2021 - Springer
Class imbalance is an important consideration for cybersecurity and machine learning. We
explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 …

Machine learning prediction of lignin content in poplar with Raman spectroscopy

W Gao, L Zhou, S Liu, Y Guan, H Gao, B Hui - Bioresource Technology, 2022 - Elsevier
Based on features extracted from Raman spectra, regularization algorithms, SVR, DT, RF,
LightGBM, CatBoost, and XGBoost were used to develop prediction models for lignin …

Investigating the effectiveness of one-class and binary classification for fraud detection

JL Leevy, J Hancock, TM Khoshgoftaar… - Journal of Big Data, 2023 - Springer
Research into machine learning methods for fraud detection is of paramount importance,
largely due to the substantial financial implications associated with fraudulent activities. Our …

Data integration challenges for machine learning in precision medicine

M Martínez-García, E Hernández-Lemus - Frontiers in medicine, 2022 - frontiersin.org
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on
different databases about the molecular and environmental origins of disease, into analytic …

Leveraging lightgbm for categorical big data

J Hancock, TM Khoshgoftaar - 2021 IEEE Seventh International …, 2021 - ieeexplore.ieee.org
LightGBM is a popular Gradient Boosted Decision Tree implementation for classification and
regression tasks. Our contribution is to answer a research question regarding LightGBM. We …

[HTML][HTML] Data-centric solutions for addressing big data veracity with class imbalance, high dimensionality, and class overlap**

A Bolívar, V García, R Alejo, R Florencia-Juárez… - Applied Sciences, 2024 - mdpi.com
An innovative strategy for organizations to obtain value from their large datasets, allowing
them to guide future strategic actions and improve their initiatives, is the use of machine …

A deep LSTM autoencoder-based framework for predictive maintenance of a proton radiotherapy delivery system

T Dou, B Clasie, N Depauw, T Shen, R Brett… - Artificial Intelligence in …, 2022 - Elsevier
Introduction Unscheduled machine downtime can cause treatment interruptions and
adversely impact patient treatment outcomes. Conventional Quality Assurance (QA) …