Fuzzy Reasoning Based Evolutionary Algorithms Applied to Data Mining
View/ Open
Abstract
Data mining and information retrieval are two difficult tasks for various reasons. First, as the volume of data increases tremendously, most of the data are complex, large, imprecise, uncertain or incomplete. Furthermore, information retrieval may be imprecise or subjective. Therefore, comprehensible and understandable results are required by the users during the process of data mining or knowledge discovery. Fuzzy logic has become an active research area because its capability of handling perceptual uncertainties, such as ambiguity or vagueness, and its excellent ability on describing nonlinear system. The study of this dissertation is focused on two main paradigms. The first paradigm focuses on applying fuzzy inductive learning on classification problems. A fuzzy classifier based on discrete particle swarm optimization and a fuzzy decision tree classifier are implemented in this paradigm. The fuzzy classifier based on discrete particle swarm optimization includes a discrete particle swarm optimization classifier and a fuzzy discrete particle swarm optimization classifier. The discrete particle swarm optimization classifier is devised and applied to discrete data. Whereas, the fuzzy discrete particle swarm optimization classifier is an improved version that can handle both discrete and continuous data to manage uncertainty and imprecision. A fuzzy decision tree classifier with a feature selection method is proposed, which is based on the ideas of mutual information and genetic algorithms. The second paradigm is fuzzy cluster analysis. The purpose is to provide efficient approaches to identify similar or dissimilar descriptions of data instances. The shapes of the clusters is either hyper-spherical or hyper-planed. A fuzzy c-means clustering approach based on particle swarm optimization, which clustering prototype is hyper-spherical, is proposed to automatically determine the optimal number of clusters. In addition, a fuzzy c-regression model, which has hyper-planed clusters, has received much attention in recent literature for nonlinear system identification and has been successfully employed in various areas. Thus, a fuzzy c-regression model clustering algorithm is applied for color image segmentation.