Mining Approximate Frequent Dense Modules from Multiple Gene Expression Datasets
View/ Open
Abstract
Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting dense modules that are recurrent in multiple gene coexpression networks has been shown to be promising in functional gene annotation and biomarkers discovery. In this thesis, we propose a biclustering-based approach for mining approximate frequent dense modules. This approach reports a large number of modules with many duplicate modules. Thus, we build on this approach and propose two extended approaches for mining dense modules, which mine set of representative patterns using post-processing and on-line pattern summarization methods. The extended approaches report smaller number of modules and less duplicate modules. Experiments on real gene coexpression networks show that frequent dense modules are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense modules.