Comparison of Classification Rates among Logistic Regression, Neural Network and Support Vector Machines in the Presence of Missing Data

Upadhyaya, Sudhi

Comparison of Classification Rates among Logistic Regression, Neural Network and Support Vector Machines in the Presence of Missing Data

Files

Comparison of Classification Rates among Logistic Regression, Neural Network and Support Vector Machines in the Presence of Missing Data.pdf (1.06 MB)

Date

2014

Authors

Upadhyaya, Sudhi

Publisher

North Dakota State University

Abstract

Statistical models such as Logistic Regression (LR), Neural Network (NN) and Support Vector Machines (SVM) often use datasets with missing values while making inferences regarding the population. When inferences are made based on the data set used, the presence of missing data can severely skew the results and distort the efficiency of the model. Our objective was to identify a robust model among LR, NN, SVM in the presence of missing data. The study was conducted by simulating observations based on Monte Carlo methods and missing data was introduced randomly at 10% level. Single mode imputation was used to impute missing values. Simple random samples of 120, 240 and 500 observations were chosen and these three models were fit for two scenarios. Results showed that the performance of SVM was far superior compared to LR or NN models. However, the classification accuracy of SVM gradually decreased as sample size increased.

URI

https://hdl.handle.net/10365/23948

Collections

Statistics Masters Papers

Full item page

Comparison of Classification Rates among Logistic Regression, Neural Network and Support Vector Machines in the Presence of Missing Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections