Entropy as a Criterion for Variable Reduction in Cluster Data
Abstract
Entropy is a measure of the randomness of a system state. This quantity gives us a measure of uncertainty that is associated with each particular observation belonging to a specific cluster. We examine this property and its potential use in analyzing high dimension datasets. Entropy proves most interesting in identifying possible dimensions that do not contribute meaningful classification to the clusters present. We can remove the dimension(s) found which are the least important and generalize this idea to a procedure. After identifying all the dimensions that should be eliminated from the dataset, we then compare its ability in recovering the true classification of the observations versus the estimated classification of the data. From the results obtained and shown in this paper, it is clear that entropy is a good candidate for a criterion in variable reduction.