Mapreduce-Enabled Scalable Nature-Inspired Approaches for Clustering

dc.contributor.authorAljarah, Ibrahim Mithgal
dc.date.accessioned2017-12-19T21:06:17Z
dc.date.available2017-12-19T21:06:17Z
dc.date.issued2014
dc.description.abstractThe increasing volume of data to be analyzed imposes new challenges to the data mining methodologies. Traditional data mining such as clustering methods do not scale well with larger data sizes and are computationally expensive in terms of memory and time. Clustering large data sets has received attention in the last few years in several application areas such as document categorization, which is in urgent need of scalable approaches. Swarm intelligence algorithms have self-organizing features, which are used to share knowledge among swarm members to locate the best solution. These algorithms have been successfully applied to clustering, however, they suffer from the scalability issue when large data is involved. In order to satisfy these needs, new parallel scalable clustering methods need to be developed. The MapReduce framework has become a popular model for parallelizing data-intensive applications due to its features such as fault-tolerance, scalability, and usability. However, the challenge is to formulate the tasks with map and reduce functions. This dissertation firstly presents a scalable particle swarm optimization (MR-CPSO) clustering algorithm that is based on the MapReduce framework. Experimental results reveal that the proposed algorithm scales very well with increasing data set sizes while maintaining good clustering quality. Moreover, a parallel intrusion detection system using the MR-CPSO is introduced. This system has been tested on a real large-scale intrusion data set to confirm its scalability and detection quality. In addition, the MapReduce framework is utilized to implement a parallel glowworm swarm optimization (MR-GSO) algorithm to optimize difficult multimodal functions. The experiments demonstrate that MR-GSO can achieve high function peak capture rates. Moreover, this dissertation presents a new clustering algorithm based on GSO (CGSO). CGSO takes into account the multimodal search capability to locate optimal centroids in order to enhance the clustering quality without the need to provide the number of clusters in advance. The experimental results demonstrate that CGSO outperforms other well-known clustering algorithms. In addition, a MapReduce GSO clustering (MRCGSO) algorithm version is introduced to evaluate the algorithm's scalability with large scale data sets. MRCGSO achieves a good speedup and utilization when more computing nodes are used.en_US
dc.identifier.urihttps://hdl.handle.net/10365/27094
dc.publisherNorth Dakota State Universityen_US
dc.rightsNDSU Policy 190.6.2
dc.rights.urihttps://www.ndsu.edu/fileadmin/policy/190.pdf
dc.titleMapreduce-Enabled Scalable Nature-Inspired Approaches for Clusteringen_US
dc.typeDissertationen_US
ndsu.advisorLudwig, Simone
ndsu.collegeEngineeringen_US
ndsu.degreeDoctor of Philosophy (PhD)en_US
ndsu.departmentComputer Scienceen_US
ndsu.programComputer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mapreduce-Enabled Scalable Nature-Inspired Approaches for Clustering.pdf
Size:
3.48 MB
Format:
Adobe Portable Document Format
Description:
Mapreduce-Enabled Scalable Nature-Inspired Approaches for Clustering

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed to upon submission
Description: