Classifying Gene Coexpression Networks Using Discrimination Pattern Mining
Abstract
Several algorithms for graph classi cation have been proposed. Algorithms that map graphs
into feature vectors encoding the presence/absence of speci c subgraphs, have shown excellent performance.
Most of the existing algorithms mine for subgraphs that appear frequently in graphs
belonging to one class label and not so frequently in the other graphs. Gene coexpression networks
classi cation attracted a lot of attention in the recent years from researchers in both biology and
data mining because of its numerous useful applications. The advances in high-throughput technologies
that provide an easy access to large microarray datasets necessitated the development of
new techniques that can scale well with large datasets and produce a very accurate results. In this
thesis, we propose a novel approach for mining discriminative patterns. We propose two algorithms
for mining discriminative patterns and then we use these patterns for graph classi cation. Experiments
on large coexpression graphs show that the proposed approach has excellent performance
and scales to graphs with millions of edges. We compare our proposed algorithm to two baseline
algorithms and we show that our algorithm outperforms the baseline techniques with a very high
accurate graph classi cation. Moreover, we perform topological and biological enrichment analysis
on the discriminative patterns reported by our mining algorithm and we show that the reported
patterns are signi cantly enriched.