Entropy as a Criterion for Variable Reduction in Cluster Data

Olson, Christopher

Entropy as a Criterion for Variable Reduction in Cluster Data

Files

Entropy as a Criterion for Variable Reduction in Cluster Data.pdf (402.82 KB)

Date

2012

Authors

Olson, Christopher

Publisher

North Dakota State University

Abstract

Entropy is a measure of the randomness of a system state. This quantity gives us a measure of uncertainty that is associated with each particular observation belonging to a specific cluster. We examine this property and its potential use in analyzing high dimension datasets. Entropy proves most interesting in identifying possible dimensions that do not contribute meaningful classification to the clusters present. We can remove the dimension(s) found which are the least important and generalize this idea to a procedure. After identifying all the dimensions that should be eliminated from the dataset, we then compare its ability in recovering the true classification of the observations versus the estimated classification of the data. From the results obtained and shown in this paper, it is clear that entropy is a good candidate for a criterion in variable reduction.

URI

https://hdl.handle.net/10365/26760

Collections

Statistics Masters Theses

Full item page

Entropy as a Criterion for Variable Reduction in Cluster Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections