A Data Visualization Tool to Identify Patterns Formed by Subsets of Data
Abstract
An object may be identified by its properties which may comprise both
continuous data values and categorical data values. Sometimes a particular property
(categorical) value may select a group of objects which have similarity in their
attributes (continuous). This feature is observed in gene data sets as well as other
multivariate data sets.
This paper presents a tool to identify patterns formed by data subsets based upon
some categorical attributes. The tool reads data from various data sources, filters data
according to supplied criteria and shows them as a Parallel Coordinates graph which a
user can manipulate to find the relation between data subsets by changing the selection
criteria. The user is given a choice to change values of the various categorical columns
and to select a subset of data to find out their similarity.
This tool can be used both in desktop and web environments. The class library is
implemented in C# and also ported to Java to make it widely available. Other
application developers can also use this library in their application to have this
functionality readily available.