Browsing by Author "Wang, Qi"

Now showing 1 - 3 of 3

Integrative Data Analysis of Microarray and RNA-seq
(North Dakota State University, 2018) Wang, Qi
Background: Microarray and RNA sequencing (RNA-seq) are two commonly used high-throughput technologies for gene expression profiling for the past decades. For global gene expression studies, both techniques are expensive, and each has its unique advantages and limitations. Integrative analysis of these two types of data would provide increased statistical power, reduced cost, and complementary technical advantages. However, the complete different mechanisms of the high-throughput techniques make the two types of data highly incompatible. Methods: Based on the degrees of compatibility, the genes are grouped into different clusters using a novel clustering algorithm, called Boundary Shift Partition (BSP). For each cluster, a linear model is fitted to the data and the number of differentially expressed genes (DEGs) is calculated by running two-sample t-test on the residuals. The optimal number of cluster can be determined using the selection criteria that is penalized on the number of parameters for model fitting. The method was evaluated using the data simulated from various distributions and it was compared with the conventional K-means clustering method, Hartigan-Wong’s algorithm. The BSP algorithm was applied to the microarray and RNA-seq data obtained from the embryonic heart tissues from wild type mice and Tbx5 mice. The raw data went through multiple preprocessing steps including data transformation, quantile normalization, linear model, principal component analysis and probe alignments. The differentially expressed genes between wild type and Tbx5 are identified using the BSP algorithm. Results: The accuracies of the BSP algorithm for the simulation data are higher than those of Hartigan-Wong’s algorithm for the cases with smaller standard deviations across the five different underlying distributions. The BSP algorithm can find the correct number of the clusters using the selection criteria. The BSP method identifies 584 differentially expressed genes between the wild type and Tbx5 mice. A core gene network developed from the differentially expressed genes showed a set of key genes that were known to be important for heart development. Conclusion: The BSP algorithm is an efficient and robust classification method to integrate the data obtained from microarray and RNA-seq.
Protein-ligand Docking Application and Comparison using Discovery Studio and AutoDock
(North Dakota State University, 2017) Wang, Qi
Protein-ligand docking is a structure-based computational method, which is used to predict the small molecule binding modes and binding affinities with protein receptors. The goals of this study are to compare the docking performances of different software and apply the docking method to predict how protein fatty acid desaturase 1 (FADS1) interact with ligands. Two docking software, Discovery Studio and AutoDock, are used for docking comparison of 195 protein-ligand complexes from PDBind dataset. AutoDock performs a little bit better than Discovery Studio on the docking percentage, which is the percent of the docked complexes out of 195. On the other hand, Discovery Studio has a higher accuracy (successfully docked complexes, within 5 RMSD of the native complex structures) than AutoDock. The interaction between FADS1 and Sesamin shows a similar pattern comparing to the interaction between a homolog of FADS1 and a ligand shown in a PDB structure (PDB id 1EUE).
Using Imputed Microrna Regulation Based on Weighted Ranked Expression and Putative Microrna Targets and Analysis of Variance to Select Micrornas for Predicting Prostate Cancer Recurrence
(North Dakota State University, 2014) Wang, Qi
Imputed microRNA regulation based on weighted ranked expression and putative microRNA targets (IMRE) is a method to predict microRNA regulation from genome-wide gene expression. A false discovery rate (FDR) for each microRNA is calculated using the expression of the microRNA putative targets to analyze the regulation between different conditions. FDR is calculated to identify the differences of gene expression. The dataset used in this research is the microarray gene expression of 596 patients with prostate cancer. This dataset includes three different phenotypes: PSA (Prostate-Specific Antigen recurrence), Systemic (Systemic Disease Progression) and NED (No Evidence of Disease). We used the IMRE and ANOVA methods to analyze the dataset and identified several microRNA candidates that can be used to predict PSA recurrence and systemic disease progression in prostate cancer patients.