Two Data Mining Applications for Predicting Pre-Diabetes
Abstract
In this study, the performance of Logistic Regression and Decision Tree modeling is compared by using SAS Enterprise Miner for predicting pre-diabetes in US population by using several of the common factors from the type 2 diabetes screening criteria. From 17 variables of NHANES’ three sets of dataset, a total of 13 risk factors were selected as predictors of pre-diabetes. A comparison of two data mining methodology showed that Decision Tree has a higher ROC index than Logistic Regression modeling. All ROC indexes for two models were greater than 77% indicating both methods present a good prediction for pre-diabetes. The predictive accuracy of the two models was greater than 72% on the whole dataset. Decision tree modeling also resulted in higher accuracy and sensitivity values than Logistic Regression modeling. Taken as a whole, the results of comparison indicated Decision Tree modeling is a better indicator to predict pre-diabetes.