Browsing by Author "Soumare, Ibrahim"

Now showing 1 - 2 of 2

Comparing Performance of ANOVA to Poisson and Negative Binomial Regression When Applied to Count Data
(North Dakota State University, 2020) Soumare, Ibrahim
Analysis of Variance (ANOVA) is the easiest and most widely used model nowadays in statistics. ANOVA however requires a set of assumptions for the model to be a valid choice and for the inferences to be accurate. Among many, ANOVA assumes the data in question is normally distributed and homogenous. However, data from most disciplines does not meet the assumption of normality and/or equal variance. Regrettably, researchers do not always check whether the assumptions are met, and if these assumptions are violated, inferences might well be wrong. We conducted a simulation study to compare the performance of standard ANOVA to Poisson and Negative Binomial models when applied to counts data. We considered different combination of sample sizes and underlying distributions. In this simulation study, we first assed Type I error for each model involved. We then compared power as well as the quality of the estimated parameters across the models.
Performance of Permutation Tests Using Simulated Genetic Data
(North Dakota State University, 2022) Soumare, Ibrahim
Disease statuses and biological conditions are known to be greatly impacted by differences in gene expression levels. A common challenge in RNA-seq data analysis is to identify genes whose mean expression levels change across different groups of samples, or, more generally, are associated with one or more variables of interest. Such analysis is called differential expression analysis. Many tools have been developed for analyzing differential gene expression (DGE) for RNA-seq data. RNA-seq data are represented as counts. Typically, a generalized linear model with a log link and a negative binomial response is fit to the count data for each gene, and DE genes are identified by testing, for each gene, whether a model parameter or linear combination of model parameters is zero. We conducted a simulation study to compare the performance of our proposed modified permutation test to DESeq2 edgeR, Limma, LFC and Voom when applied to RNA-seq data. We considered different combinations of sample sizes and underlying distributions. In this simulation study, we first simulated data using Monte Carlo simulation in SAS and assessed True Detection rate and False Positive rate for each model involved. We then simulated data from real RNA-seq data using SimSeq algorithm and compared the performance of our proposed model to DESeq2 edgeR, Limma, LFC and Voom. The simulation results suggest that Permutation tests are a competitive alternative to traditional parametric methods for analyzing RNA-seq data when we have sufficient sample sizes. Specifically, the results show that Permutation controlled Type I error fairly well and had a comparable Power rate. Moreover, for a sample size n≥10 simulation exhibited a comparable True detection rate and consistently kept the False Positive rate very low when sampling from Poisson and Negative Binomial distributions. Likewise, the results from SimSeq confirm that Permutation tests do a better job at keeping the False Positive rate the lowest.