Improved Genetic Programming Techniques For Data Classification
Abstract
Evolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.