Clustering Algorithm Comparison for Ellipsoidal Data
Abstract
The main objective of cluster analysis is the statistical technique of identifying data points and assigning them into meaningful clusters. The purpose of this paper is to compare different types of clustering algorithms to find the clustering algorithm that performs the best for varying complexities in Gaussian data. The clustering algorithms used would include: Partitioning Around Medoids (PAM), K-means, Hierarchical with different linkages (Ward’s linkage, Single linkage, Complete linkage, Average linkage, McQuitty’s method, Gower’s method, and Centroid method). The different types of complexities would include different number of dimensions, average pairwise overlap between clusters, number of points simulated from each cluster. After the data is simulated the Adjusted Rand Index will be used gauge the performance of the clusters. From that a t-test will also be used to see if there are any clustering algorithms that as well as other clustering algorithms.