Statistics Doctoral Work
Permanent URI for this collectionhdl:10365/32399
Browse
Browsing Statistics Doctoral Work by Issue Date
Now showing 1 - 20 of 35
- Results Per Page
- Sort Options
Item Model Validation and Diagnostis in Right Censored Regression(North Dakota State University, 2013) Miljkovic, TatjanaWhen censored data are present in the linear regression setting, the Expectation-Maximization (EM) algorithm and the Buckley and James (BJ) method are two algorithms that can be implemented to fit the regression model. We focus our study on the EM algorithm because it is easier to implement than the BJ algorithm and it uses common assumptions in regression theory, such as normally distributed errors. The BJ algorithm, however, is used for comparison purposes in benchmarking the EM parameter estimates, their variability, and model selection. In this dissertation, validation and influence diagnostic tools are proposed for right censored regression using the EM algorithm. These tools include a reconstructed coefficient of determination, a test for outliers based on the reconstructed Jackknife residual, and influence diagnostics with one-step deletion. To validate the proposed methods, extensive simulation studies are performed to compare the performances of the EM and BJ algorithms in parameter estimation for data with different error distributions, the proportion of censored data, and sample sizes. Sensitivity analysis for the reconstructed coefficient of determination is developed to show how the EM algorithm can be used in model validation for different amounts of censoring and locations of the censored data. Additional simulation studies show the capability of the EM algorithm to detect outliers for different types of outliers (uncensored and censored), proportions of censored data, and the locations of outliers. The proposed formula for the one-step deletion method is validated with an example and a simulation study. Additionally, this research proposes a novel application of the EM algorithm for modeling right censored regression in the area of actuarial science. Both the EM and BJ algorithms are utilized in modeling health benefit data provided by the North Dakota Department of Veterans Affairs (ND DVA). Proposed model validation and diagnostic tools are applied using the EM algorithm. Results of this study can be of great benefit to government policy makers and pricing actuaries.Item Proposed Nonparametric Tests for the Simple Tree Alternative in a Mixed Design(North Dakota State University, 2014) Olet, SusanFor the general alternative, many test statistics exist for the dependent and independent variables. However, no documented test statistics exist for simple tree alternative for the dependent variables, independent variables, and mixed designs that consider both dependent and independent variables. This research proposes six nonparametric test statistics when we have a mixed design that consists of observations from a Randomized Complete Block Design (RCBD) and a Completely Randomized Design (CRD). A simulation was conducted to compare the proposed test statistics under five conditions: changing number of treatments, varying the underlying distribution, increasing the variance between the RCBD and CRD, changing the proportions of the RCBD portion to the CRD, and changing the shifts configurations for the treatment effects. The simulation results indicate that Approach II and Approach VI had the highest powers overall. Approach II is when equal weight √ is assigned to the standardized modified Fligner-Wolfe and standardized modified Page’s test statistic. While, Approach VI is when more weight, attributed to the sample size is assigned to the standardized modified Fligner-Wolfe, the CRD portion and less weight attributed to small number of blocks is assigned to the standardized modified Page’s test statistic, which is the RCBD portion of the mixed design. It was noted that, when the sample size was greater than the number of blocks and the RCBD and CRD variances are equal, Approach VI had the highest powers. On the other hand, when the variance in the CRD was greater than the variance of RCBD, Approach II had the highest powers. Also, when the number of blocks for the RCBD portion is greater than the sample size for the CRD portion in the mixed design, Approach II had the highest powers when the variance in the CRD portion was equal to the variance in the RCBD portion. On the other hand, when the variance in the CRD portion was greater than the variance in the RCBD portion Approach VI had the highest powers.Item Proposed Nonparametric Tests for the Simple Tree Alternative in a Mixed Design(North Dakota State University, 2014) Olet, SusanVideo summarizing Ph.D. dissertation for a non-specialist audience.Item Nonparametric Tests for the Non-Decreasing and Alternative Hypotheses for the Incomplete Block and Completely Randomized Mixed Design(North Dakota State University, 2014) Ndungu, Alfred MungaiThis research study proposes a solution to deal with missing observations which is a common problem in real world datasets. A nonparametric approach is used because of its ease of use relative to the parametric approach that beleaguer the user with firm assumptions. The study assumes data is in an Incomplete Block (IBD) and Completely Randomized (CRD) Mixed Design. The scope of this research was limited to three, four and five treatments. Mersenne - Twister (2014) simulations were used to vary the design and to estimate the test statistic powers. Two test statistics are proposed if the user expects a non – decreasing order of differences in treatment means. They are both applicable in the cited mixed design. The tests combine Alvo and Cabilio (1995) and Jonckheere – Terpstra ((Jonckheere (1954), Terpstra (1952)) in two ways: standardizing the sum of the standardized statistics and standardizing the sum of the unstandardized statistics. Results showed that the former is better. Three tests are proposed for the umbrella alternative. The first, Mungai’s test, is only applicable in an IBD. The other two tests combine Mungai’s and Mack – Wolfe (1981) using the same methods described in the previous paragraph. The same conclusion holds except when the size of the IBD’s sample was equal to or greater than a quarter that of the CRD.Item Two Approaches to the Isotonic Change-Point Problem: Nonparametric and Minimax(North Dakota State University, 2014) D'Silva, KarlA change in model parameters over time often characterizes major events. Situations in which this may arise include observing increasing temperatures, intense rainfall, and the valuation of a stock. The question is whether these observations are simply the result of natural variation, or rather are indicative of an underlying monotonic trend. This is known as the isotonic change-point problem. Two approaches to this problem are considered: Firstly, for correlated data with short-range dependence, we prove that a particular U-statistic based on a modified version of the Jonckheere-Terpstra test statistic is asymptotically equivalent to a more complex U-statistic discussed by Shen and Xu (2013); one that has been shown to outperform other existing tests in a variety of situations. Secondly, we shall justify and utilize the minimax criterion in order to identify the optimal test statistic within a specified class. We shall see that, as motivated by the projection method, the aforementioned class is the class of contrasts. It shall be proven that the set of coefficients originally proposed by Abelson and Tukey (1963), and utilized by Brillinger (1989) in the isotonic change-point setting, are in fact minimax in the independent data case. For correlated data with shortrange dependence, we shall demonstrate a sufficient condition for minimaxity to hold.Item A Study of Influential Statistics Associated with Success in the National Football League(North Dakota State University, 2015) Roith, Joseph MichaelThis dissertation considers the most important aspects of success in the National Football League (NFL). Success is defined, for this paper, as winning individual games in the short term, and making the playoffs over the course of a season in the long term. Data was collected for 750 different regular season games over the course of five seasons in the NFL, and used to create models that identify those factors which are most significant towards winning at both the short term and long term levels. A point spread model was developed using an ordinary least squares regression method, and stepwise selection technique to reduce the number of variables included. Logistic regression models were also created to state the probability a team will win an individual game, and also the probability a team will make the playoffs at the end of the season. Discriminant analysis was performed to compare the significant variables in our models, and determine which had the largest influence. We considered the relationship between offense and defense in the NFL to conclude whether or not one area had a significant advantage over the other. We also fit a proportional odds model on the data set to categorize blowout games, and those that are close at the end. The overwhelming presence of turnover margin, passing efficiency, first down margin, and sack yardage in all of our models is clear evidence that there are a handful of statistics that can explain success in the NFL. Using the statistics from games, we were able to correctly identify the winner around 88% of the time. Finally, we used simulations and historical team performances to forecast future game outcomes, our models classified the actual winner with a 71% accuracy rate. Analytics are slowly gaining momentum in football, and the advantages are clear. Quantifying success in the NFL can benefit both individual teams, and the league as a whole, to present the best possible product to their audiences.Item Boundary Estimation(North Dakota State University, 2015) Mu, YingfeiThe existing statistical methods do not provide a satisfactory solution to determining the spatial pattern in spatially referenced data, which is often required by research in many areas including geology, agriculture, forestry, marine science and epidemiology for identifying the source of the unusual environmental factors associated with a certain phenomenon. This work provides a novel algorithm which can be used to delineate the boundary of an area of hot spots accurately and e ciently. Our algorithm, rst of all, does not assume any pre-speci ed geometric shapes for the change-curve. Secondly, the computation complexity by our novel algorithm for changecurve detection is in the order of O(n2), which is much smaller than 2O(n2) required by the CUSP algorithm proposed in M uller&Song [8] and Carlstein's [2] estimators. Furthermore, our novel algorithm yields a consistent estimate of the change-curve as well as the underlying distribution mean of observations in the regions. We also study the hypothesis test of the existence of the change-curve in the presence of independence of the spatially referenced data. We then provide some simulation studies as well as a real case study to compare our algorithm with the popular boundary estimation method : Spatial scan statistic.Item Where do the Differences Lie?: An Analysis of Distance Road Running Populations(North Dakota State University, 2015) Johnson, Jennifer ElizabethRecently, much research has been focused on the gap in performance between male and female runners. Our research is focused on examining the gap between these two running populations in depth to determine where the specific differences are located. We will investigate three marathons which require some form of qualification before one can participate along with thirty-two races which do not require qualification. For the qualifying marathons, we will examine if the proportion of male and female finishers are equal at predetermined levels of performance. We will also examine the overall descriptive statistics and the age group patterns for each marathon. The non-qualifying races are equally divided among the four popular running distances – marathon, half marathon, ten kilometer (10k) and five kilometer (5k). The proportions of the male and female race finishers were tested to be equal for each individual race at three different levels of performance. To further inspect the population differences, we tested the equality of the distributions through the comparisons of the means, medians, and variances between the two populations. We also examined whether or not the differences followed a specific pattern by investigating the age groups. All results for the individual races were combined using meta-analysis both at the overall race level and for each age group at all four race distances. We performed a separate meta-analysis for the qualifying marathons and for the non-qualifying races. Several differences between the male and female population of distance runners were discovered through our research. These inequalities were not what we expected to see when we began this study.Item A Comparison of False Discovery Rate Method and Dunnett's Test for a Large Number of Treatments(North Dakota State University, 2015) Gomez, Kayeromi DonoukounmahouIt has become quite common nowadays to perform multiple tests simultaneously in order to detect differences of a certain trait among groups. This often leads to an inflated probability of at least one Type I Error, a rejection of a null hypothesis when it is in fact true. This inflation generally leads to a loss of power of the test especially in multiple testing and multiple comparisons. The aim of the research is to use simulation to address what a researcher should do to determine which treatments are significantly different from the control when there is a large number of treatments and the number of replicates in each treatment is small. We examine two situations in this simulation study: when the number of replicates per treatment is 3 and also when it is 5 and in each of these situations, we simulated from a normal distribution and in mixture of normal distributions. The total number of simulated treatments was progressively increased from 50 to 100 then 150 and finally 300. The goal is to measure the change in the performances of the False Discovery Rate method and Dunnett’s test in terms of type I error and power as the total number of treatments increases. We reported two ways of examining type I error and power: first, we look at the performances of the two tests in relation to all other comparisons in our simulation study, and secondly per simulated sample. In the first assessment, the False Discovery Rate method appears to have a higher power while keeping its type I error in the same neighborhood as Dunnett’s test and in the latter, both tests have similar powers and the False Discovery Rate method has a higher type I error. Overall, the results show that when the objective of the researcher is to detect as many of the differences as possible, then FDR method is preferred. However if error is more detrimental to the outcomes of the research, Dunnett’s test offers a better alternative.Item Comparing Several Modeling Methods on NCAA March Madness.(North Dakota State University, 2015) Hua, SuThis year (2015), according to the AGA’s (American Gaming Association) research, nearly about 40 million people filled out about 70 million March Madness brackets (Moyer, 2015). Their objective is to correctly predict the winners of each game. This paper used the probability self-consistent (PSC) model (Shen, Hua, Zhang, Mu, Magel, 2015) to make the prediction of all 63 games in the NCAA Men's Division I Basketball Tournament. PSC model was first introduced by Zhang (2012). The Logit link was used in Zhang’s (2012) paper to connect only five covariates with the conditional probability of a team winning a game given its rival team. In this work, we incorporated fourteen covariates into the model. In addition to this, we used another link function, Cauchit link, in the model to make the predictions. Empirical results show that the PSC model with Cauchit link has better average performance in both simple and doubling scoring than Logit link during the last three years of tournament play. In the generalized linear model, maximum likelihood estimation is a popular method for estimating the parameters; however, convergence failuresmay happen when using large dimension covariates in the model (Griffiths, Hill, Pope, 1987). Therefore, in the second phase in this study, Bayesian inference is used for estimating in the parameters in the prediction model. Bayesian estimation incorporates prior information such as experts’ opinions and historical results in the model. Predictions from three years of March Madness using the model obtained from Bayesian estimation with Logit link will be compared to predictions using the model obtained from maximum likelihood estimation.Item Identification of Differentially Expressed Genes When the Distribution of Effect Sizes is Asymmetric in Two Class Experiments(North Dakota State University, 2017) Kotoka, Ekua FesuwaHigh-throughput RNA Sequencing (RNA-Seq) has emerged as an innovative and powerful technology for detecting differentially expressed genes (DE) across different conditions. Unlike continuous microarray data, RNA-Seq data consist of discrete read counts mapped to a particular gene. Most proposed methods for detecting DE genes from RNA-Seq are based on statistics that compare normalized read counts between conditions. However, most of these methods do not take into account potential asymmetry in the distribution of effect sizes. In this dissertation, we propose methods to detect DE genes when the distribution of the effect sizes is observed to be asymmetric. These proposed methods improve detection of differential expression compared to existing methods. Chapter 3 proposes two new methods that modify an existing nonparametric method, Significance Analysis of Microarrays with emphasis on RNA-Seq data (SAMseq), to account for the asymmetry in the distribution of the effect sizes. Results of the simulation studies indicates that the proposed methods, compared to the SAMseq method identifies more DE genes, while adequately controlling false discovery rate (FDR). Furthermore, the use of the proposed methods is illustrated by analyzing a real RNA-Seq data set containing two different mouse strain samples. In Chapter 4, additional simulation studies are performed to show that the one of the proposed method, compared with other existing methods, provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. Chapter 5 compares the performance of parametric methods, DESeq2, NBPSeq and edgeR when there exist asymmetric effect sizes and the analysis takes into account this asymmetry. Through simulation studies, the performance of these methods are compared to the traditional BH and q-value method in the identification of DE genes. This research proposes a new method that modifies these parametric methods to account for asymmetry found in the distribution of effect sizes. Likewise, illustration on the use of these parametric methods and the proposed method by analyzing a real RNA-Seq data set containing two different mouse strain samples. Lastly, overall conclusions are given in Chapter 6.Item Predicting the Outcomes of NCAA Women’s Sports(North Dakota State University, 2017) Wang, WentingSports competitions provide excellent opportunities for model building and using basic statistical methodology in an interesting way. More attention has been paid to and more research has been conducted pertaining to men’s sports as opposed to women’s sports. This paper will focus on three kinds of women’s sports, i.e. NCAA women’s basketball, volleyball and soccer. Several ordinary least squares models were developed that help explain the variation in point spread of a women’s basketball game, volleyball game and soccer game based on in-game statistics. Several logistic models were also developed that help estimate the probability that a particular team will win the game for women’s basketball, volleyball and soccer tournaments. Ordinary least squares models for Round 1, Round 2 and Rounds 3-6 with point spread being the dependent variable by using differences in ranks of seasonal averages and differences of seasonal averages were developed to predict winners of games in each of those rounds for the women’s basketball, volleyball and soccer tournament. Logistic models for Round 1, Round 2 and Rounds 3-6 that estimate the probability of a team winning the game by using differences in ranks of seasonal averages and differences of seasonal averages were developed to predict winners of games in each of those rounds for the basketball, volleyball and soccer tournaments. The prediction models were validated before doing the prediction. For basketball, the least squares model developed by using differences in ranks of seasonal averages with a double scoring system variable predicted the results of a 76.2% of the games for the entire tournament with all the predictions made before the start of the tournament. For volleyball, the logistic model developed by using differences of seasonal averages predicted 65.1% of the games for the entire tournament. For soccer, the logistic regression model developed by using differences of seasonal averages predicted 45% of all games in the tournament. Correctly when all 6 rounds were predicted before the tournament began. In this case, team predicted to win in the second round or higher might not have even made it to this round since prediction was done ahead of time.Item Integrative Data Analysis of Microarray and RNA-seq(North Dakota State University, 2018) Wang, QiBackground: Microarray and RNA sequencing (RNA-seq) are two commonly used high-throughput technologies for gene expression profiling for the past decades. For global gene expression studies, both techniques are expensive, and each has its unique advantages and limitations. Integrative analysis of these two types of data would provide increased statistical power, reduced cost, and complementary technical advantages. However, the complete different mechanisms of the high-throughput techniques make the two types of data highly incompatible. Methods: Based on the degrees of compatibility, the genes are grouped into different clusters using a novel clustering algorithm, called Boundary Shift Partition (BSP). For each cluster, a linear model is fitted to the data and the number of differentially expressed genes (DEGs) is calculated by running two-sample t-test on the residuals. The optimal number of cluster can be determined using the selection criteria that is penalized on the number of parameters for model fitting. The method was evaluated using the data simulated from various distributions and it was compared with the conventional K-means clustering method, Hartigan-Wong’s algorithm. The BSP algorithm was applied to the microarray and RNA-seq data obtained from the embryonic heart tissues from wild type mice and Tbx5 mice. The raw data went through multiple preprocessing steps including data transformation, quantile normalization, linear model, principal component analysis and probe alignments. The differentially expressed genes between wild type and Tbx5 are identified using the BSP algorithm. Results: The accuracies of the BSP algorithm for the simulation data are higher than those of Hartigan-Wong’s algorithm for the cases with smaller standard deviations across the five different underlying distributions. The BSP algorithm can find the correct number of the clusters using the selection criteria. The BSP method identifies 584 differentially expressed genes between the wild type and Tbx5 mice. A core gene network developed from the differentially expressed genes showed a set of key genes that were known to be important for heart development. Conclusion: The BSP algorithm is an efficient and robust classification method to integrate the data obtained from microarray and RNA-seq.Item Bayesian Lasso Models – With Application to Sports Data(North Dakota State University, 2018) Gao, DiSeveral statistical models were proposed by researchers to fulfill the objective of correctly predicting the winners of sports game, for example, the generalized linear model (Magel & Unruh, 2013) and the probability self-consistent model (Shen et al., 2015). This work studied Bayesian Lasso generalized linear models. A hybrid model estimation approach of full and Empirical Bayesian was proposed. A simple and efficient method in the EM step, which does not require sample mean from the random samples, was also introduced. The expectation step was reduced to derive the theoretical expectation directly from the conditional marginal. The findings of this work suggest that future application will significantly cut down the computation load. Due to Lasso (Tibshirani, 1996)’s desired geometric property, the Lasso method provides a sharp power in selecting significant explanatory variables and has become very popular in solving big data problem in the last 20 years. This work was constructed with Lasso structure hence can also be a good fit to achieve dimension reduction. Dimension reduction is necessary when the number of observations is less than the number of parameters or when the design matrix is non-full rank. A simulation study was conducted to test the power of dimension reduction and the accuracy and variation of the estimates. For an application of the Bayesian Lasso Probit Linear Regression to live data, NCAA March Madness (Men’s Basketball Division I) was considered. In the end, the predicting bracket was used to compare with the real tournament result, and the model performance was evaluated by bracket scoring system (Shen et al., 2015).Item Adaptive Two-Stage Optimal Design for Estimating Multiple EDps under the 4-Parameter Logistic Model(North Dakota State University, 2018) Zhang, AnqingIn dose-finding studies, c-optimal designs provide the most efficient design to study an interesting target dose. However, there is no guarantee that a c-optimal design that works best for estimating one specific target dose still performs well for estimating other target doses. Considering the demand in estimating multiple target dose levels, the robustness of the optimal design becomes important. In this study, the 4-parameter logistic model is adopted to describe dose-response curves. Under nonlinear models, optimal design truly depends on the pre-specified nominal parameter values. If the pre-specified values of the parameters are not close to the true values, optimal designs become far from optimum. In this research, I study an optimal design that works well for estimating multiple s and for unknown parameter values. To address this parameter uncertainty, a two-stage design technique is adopted using two different approaches. One approach is to utilize a design augmentation at the second stage, the other one is to apply a Bayesian paradigm to find the optimal design at the second stage. For the Bayesian approach, one challenging task is that it requires heavy computation in the numerical calculation when searching for the Bayesian optimal design. To overcome this problem, a clustering method can be applied. These two-stage design strategies are applied to construct a robust optimal design for estimating multiple s. Through a simulation study, the proposed two-stage optimal designs are compared with the traditional uniform design and the enhanced uniform design to see how well they perform in estimating multiple s when the parameter values are mis-specified.Item Conditional Random Field with Lasso and its Application to the Classification of Barley Genes Based on Expression Level Affected by Fungal Infection(North Dakota State University, 2019) Liu, XiyuanThe classification problem of gene expression level, more specifically, gene expression analysis, is a major research area in statistics. There are several classical methods to solve the classification problem. To apply Logistic Regression Model (LRM) and other classical methods, the observations in the dataset should fit the assumption of independence. That is, the observations in the dataset are independent to each other, and the predictor (independent variable) should be independent. These assumptions are usually violated in gene expression analysis. Although the Classical Hidden Markov Chain Model (HMM) can solve the independence of observation problem, the classical HMM requires the independent variables in the dataset are discrete and independent. Unfortunately, the gene expression level is a continuous variable. To solve the classification problem of Gene Expression Level data, the Conditional Random Field(CRF) is introduce. Finally, the Least Absolute Selection and Shrinkage Operator (LASSO) penalty, a dimensional reduction method, is introduced to improve the CRF model.Item Measuring Performance of United States Commercial and Domestic Banks and its Impact on 2007-2009 Financial Crisis(North Dakota State University, 2019) Sakouvogui, KekouraIn the analysis of efficiency measures, the statistical Stochastic Frontier Analysis (SFA) and linear programming Data Envelopment Analysis (DEA) estimators have been widely applied. This dissertation is centered around two main goals. First, this dissertation addresses respectively the individual limitations of SFA and DEA models in chapters 2 and 3 using Monte Carlo (MC) simulations. Motivated by the lack of justification for the choice of inefficiency distributions in MC simulations, chapter 2 develops the statistical parameters, i.e., mean and standard deviation of the inefficiency distributions - truncated normal, half normal, and exponential. MC simulations results show that within the conventional and proposed approaches, misspecification of the inefficiency distribution matters. More precisely, within the proposed approach, the misspecified truncated normal SFA model provides the smallest mean absolute deviation and mean square error when the inefficiency distribution is a half normal. Chapter 3 examines several misspecifications of the DEA efficiency measures while accounting for the stochastic inefficiency distributions of truncated normal, half normal, and exponential derived in chapter 2. MC simulations were conducted to examine the performance of the DEA model under two different data generating processes - logarithm and level, and across five different scenarios - inefficiency distributions, sample sizes, production functions, input distributions, and curse of dimensionality. The results caution DEA practitioners concerning the accuracy of their estimates and the implications within proposed and conventional approaches of the inefficiency distributions. Second, this dissertation presents in chapter 4 an empirical assessment of the liquidity and solvency financial factors on the cost efficiency measures of U.S banks while accounting for regulatory, macroeconomic, and bank internal factors. The results suggest that the liquidity and solvency financial factors negatively impacted the cost efficiency measures of U.S banks from 2005 to 2017. Moreover, during the financial crisis, U.S banks were inefficient in comparison to the tranquil period, and the solvency financial factor insignificantly impacted the cost efficiency measures. In addition, U.S banks’ liquidity financial factor negatively collapsed due to contagion during the financial crisis.Item A Conditional Random Field (CRF) Based Machine Learning Framework for Product Review Mining(North Dakota State University, 2019) Ming, YueThe task of opinion mining from product reviews has been achieved by employing rule-based approaches or generative learning models such as hidden Markov models (HMMs). This paper introduced a discriminative model using linear-chain Conditional Random Fields (CRFs) that can naturally incorporate arbitrary, non-independent features of the input without conditional independence among the features or distributional assumptions of inputs. The framework firstly performs part-of-speech (POS) tagging tasks over each word in sentences of review text. The performance is evaluated based on three criteria: precision, recall and F-score. The result shows that this approach is effective for this type of natural language processing (NLP) tasks. Then the framework extracts the keywords associated with each product feature and summarizes into concise lists that are simple and intuitive for people to read.Item Proposed Nonparametric Tests for the Umbrella Alternative in a Mixed Design for Both Known and Unknown Peak(North Dakota State University, 2019) Alsuhabi, Hassan RashedIn several situations, and among various treatment effects, researchers might test for an umbrella alternative. The need for an umbrella alternative arises in the evaluation of the reaction to drug dosage. For instance, the reaction might increase as the level of drug dosage increases, where after exceeding the optimal dosage a downturn may occur. A test statistic used for the umbrella alternative was proposed by Mack and Wolfe (1981) using a completely randomized design. Moreover, an extension of the Mack-Wolfe test for the randomized complete block design was proposed by Kim and Kim (1992), where the blocking factor was introduced. This thesis proposes two nonparametric test statistics for mixed design data with k treatments when the peak is known and four statistics when the peak is unknown. The data are a mixture of a CRD and an RCBD. A Monte Carlo simulation is conducted to compare the power of the first two proposed tests when the peak is known, and each one of them has been compared to the tests that were proposed by Magel et al. (2010). Also, it is conducted to compare the power of the last four proposed tests when the peak is unknown. In this study, we consider the simulation from exponential, normal and t distributions with 3 degrees of freedom. For every distribution, equal sample sizes for the CRD portion are selected so that the sample size, n, is 6, 10, 16 and 20. The number of blocks for the RCBD are considered to be half, equal and twice the sample size for each treatment. Furthermore, a variety of location parameter configurations are considered for three, four and five populations. The powers were estimated for both cases, known and unknown peak. In both cases, the results of the simulation study show that the proposed tests, in which we use the method of standardized first, generally perform better than those with standardized second. This thesis also shows that adding the distance modification to the Mack-Wolfe and Kim- Kim statistics provides more power to the proposed test statistics more than those without the application of the distance modification.Item Proposed Methods for the Nondecreasing Order-Restricted Alternative in a Mixed Design(North Dakota State University, 2020) Alnssyan, Badr SulimanNonparametric statistics are commonly used in the field of statistics due to their robustness when the underlying assumptions are violated for the usual parametric statistics. In this dissertation, we proposed eight nonparametric methods to test for nondecreasing ordered alternative for a mixed design consisting of a combination of completely randomized design (CRD) and randomized complete block design (RCBD). There were four nonparametric tests, based on the Jonckheere-Terpstra test and modifications of it, employed to propose these nonparametric methods. A Monte Carlo simulation study was conducted using SAS program to investigate the performance of the proposed tests under a variety of nondecreasing location shifts among three, four and five populations and then compare these powers to each other and with the powers of the test statistics introduced by Magel et al. (2009). Three underlying distributions are used in the study including the standard normal distribution, the standard exponential distribution and student's t-distribution (3 degrees of freedom). We considered three scenarios of proportions of the number of blocks in the RCBD portion to the sample size in the CRD portion, namely, assuming that the portion of the number of blocks in RCBD is larger, equal, and smaller than the portion of the sample size in the CRD. Moreover, equal and unequal sample sizes were both considered for the CRD portion. The results of the simulation study indicate that all the proposed methods maintain their type one error and also indicate that at least one of the proposed methods did better compared to the tests of Magel et al. (2009) in terms of the estimated powers. In general, situations are found in which the proposed methods have higher powers and situations are found in which tests in Magel et al. (2009) have higher powers.