Comparative Classification of Prostate Cancer Data using the Support Vector Machine, Random Forest, Dualks and k-Nearest Neighbours
Abstract
This paper compares four classifications tools, Support Vector Machine (SVM), Random
Forest (RF), DualKS and the k-Nearest Neighbors (kNN) that are based on different statistical
learning theories. The dataset used is a microarray gene expression of 596 male patients with
prostate cancer. After treatment, the patients were classified into one group of phenotype with
three levels: PSA (Prostate-Specific Antigen), Systematic and NED (No Evidence of Disease).
The purpose of this research is to determine the performance rate of each classifier by selecting
the optimal kernels and parameters that give the best prediction rate of the phenotype. The
paper begins with the discussion of previous implementations of the tools and their
mathematical theories. The results showed that three classifiers achieved a comparable
performance that was above the average while DualKS did not. We also observed that SVM
outperformed the kNN, RF and DualKS classifiers.