Consumer segmentation with large language models

Y Li, Y Liu, M Yu - Journal of Retailing and Consumer Services, 2025 - Elsevier
Consumer segmentation is vital for companies to customize their offerings effectively. Our
study explores the application of Large Language Models (LLMs) in marketing research for …

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

RKL Kennedy, Z Salekshahrezaee, F Villanustre… - Journal of Big Data, 2023 - Springer
Fraud datasets often times lack consistent and accurate labels, and are characterized by
having high class imbalance where the number of fraudulent examples are far fewer than …

Fraud detection in healthcare claims using machine learning: A systematic review

A du Preez, S Bhattacharya, P Beling… - Artificial Intelligence in …, 2024 - Elsevier
Objective: Identifying fraud in healthcare programs is crucial, as an estimated 3%–10% of
the total healthcare expenditures are lost to fraudulent activities. This study presents a …

Medical provider embeddings for healthcare fraud detection

JM Johnson, TM Khoshgoftaar - SN Computer Science, 2021 - Springer
Advances in data mining and machine learning continue to transform the healthcare industry
and provide value to medical professionals and patients. In this study, we address the …

Leveraging lightgbm for categorical big data

J Hancock, TM Khoshgoftaar - 2021 IEEE Seventh International …, 2021 - ieeexplore.ieee.org
LightGBM is a popular Gradient Boosted Decision Tree implementation for classification and
regression tasks. Our contribution is to answer a research question regarding LightGBM. We …

Encoding high-dimensional procedure codes for healthcare fraud detection

JM Johnson, TM Khoshgoftaar - SN Computer Science, 2022 - Springer
Abstract Machine learning applications for healthcare are resha** the industry with new
tools and services designed to improve the quality of patient care. A challenge common to …

Categorical feature encoding techniques for improved classifier performance when dealing with imbalanced data of fraudulent transactions

D Breskuvienė, G Dzemyda - International Journal of Computers …, 2023 - fsja.univagora.ro
Fraudulent transaction data tend to have several categorical features with high cardinality. It
makes data preprocessing complicated if categories in such features do not have an order …

Encoding techniques for high-cardinality features and ensemble learners

JM Johnson, TM Khoshgoftaar - 2021 IEEE 22nd international …, 2021 - ieeexplore.ieee.org
This study evaluates the classification performance of five encoding techniques for high-
cardinality categorical features. Encoding techniques are tested using popular bagging and …

Robust thresholding strategies for highly imbalanced and noisy data

JM Johnson, TM Khoshgoftaar - 2021 20th IEEE International …, 2021 - ieeexplore.ieee.org
Many studies have shown that non-default decision thresholds are required to maximize
classification performance on highly imbalanced data sets. Thresholding strategies include …

Exploring maximum tree depth and random undersampling in ensemble trees to optimize the classification of imbalanced big data

JT Hancock III, TM Khoshgoftaar - SN Computer Science, 2023 - Springer
We present findings from experiments in Medicare fraud detection, that are the result of
research on two new, publicly available datasets. In this research, we employ popular, open …