Detection of Fraudulent Medicare Providers using Decision Tree and Logistic Regression Models
DOI:
https://doi.org/10.48047/Keywords:
Healthcare, Fraud detection, Supervised methods, Unsupervised methodsAbstract
With the overall increase in the elderly population comes additional, necessary medical needs and
costs. Medicare is a U.S. healthcare program that provides insurance, primarily to individuals 65 years
or older, to offload some of the financial burden associated with medical care. Even so, healthcare
costs are high and continue to increase. Fraud is a major contributor to these inflating healthcare
expenses. Our paper provides a comprehensive study leveraging machine learning methods to detect
fraudulent Medicare providers. We use publicly available Medicare data and provider exclusions for
fraud labels to build and assess three different learners. In order to lessen the impact of class
imbalance, given so few actual fraud labels, we employ random under sampling creating four class
distributions. Our results show that the C4.5 decision tree and logistic regression learners have the
best fraud detection performance, particularly for the 80:20 class distribution with average AUC
scores of 0.883 and 0.882, respectively, and low false negative rates. We successfully demonstrate the
efficacy of employing machine learning with random under sampling to detect Medicare fraud.




