Comparative Classification of Prostate Cancer Data using the Support Vector Machine, Random Forest, Dualks and k-Nearest Neighbours

Sakouvogui, Kekoura

Author/Creator

Sakouvogui, Kekoura

More Information

Show full item record

View/Open

Comparative Classification of Prostate Cancer Data using the Support Vector Machine, Random Forest, Dualks and K-Nearest Neighbours.pdf (769.3Kb)

Abstract

This paper compares four classifications tools, Support Vector Machine (SVM), Random Forest (RF), DualKS and the k-Nearest Neighbors (kNN) that are based on different statistical learning theories. The dataset used is a microarray gene expression of 596 male patients with prostate cancer. After treatment, the patients were classified into one group of phenotype with three levels: PSA (Prostate-Specific Antigen), Systematic and NED (No Evidence of Disease). The purpose of this research is to determine the performance rate of each classifier by selecting the optimal kernels and parameters that give the best prediction rate of the phenotype. The paper begins with the discussion of previous implementations of the tools and their mathematical theories. The results showed that three classifiers achieved a comparable performance that was above the average while DualKS did not. We also observed that SVM outperformed the kNN, RF and DualKS classifiers.

URI

https://hdl.handle.net/10365/27698

Collections

Statistics Masters Theses