Statistics Masters Papers
Permanent URI for this collectionhdl:10365/32400
Browse
Browsing Statistics Masters Papers by browse.metadata.type "Master's Paper"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Comparing Tests for a Mixed Design with Block Effect(North Dakota State University, 2009) Zhao, HuiTests Comb and Comb II are used to test the equality of means in a mixed design which is a combination of randomized complete block design and completely randomized design. The powers of Comb and Comb II for a mixed design have already been compared with Page's test (Magel, Terpstra, Wen (2009)) when there was little or no block effect added to the portion that was analyzed as a completely randomized design. In this paper, we wish to compare the tests when the portion of the design analyzed as a completely randomized design actually has a block effect. A Monte Carlo simulation study was conducted to compare the power of the three tests where Page's test was used only on data from the randomized complete block portion. A variety of situations were considered. Three underlying distributions were included in the simulation study. These included the normal distribution, exponential distribution, and t distribution with degree of freedom equal to 3. For every distribution, 16, 32 and 40 blocks were used in the randomized complete block design portion where the equal sample size of completely randomized data portion was 1/8, 1/4 and 1/2 the number of blocks considered. Unequal sample sizes for the completely randomized design portion were also considered. Powers were estimated for different location parameter arrangements for 3, 4 and 5 populations. Two variances, 0.25 and I, for the block effect were used. The block factor added into the completely randomized design portion didn't change the test with highest rejection percentage for the equal sample size cases, although the powers of the two tests for the mixed design decreased. For most of unequal sample size cases, Page's test has the highest rejection percentage. Overall, it was concluded that it was better to use one of the two tests for mixed design instead of Page's test when there were equal sample sizes for portion analyzed as a completely randomized design. When there were not equal size samples, but the first sample size was twice the size of the others, it was generally better to use Comb over Page's unless the number of populations became very large or there was a large block effect variance.Item Factors Influencing Carbon Sequestration in Northern Great Plains Grasslands(North Dakota State University, 2011) AnnamSoil development is influenced by the five soil forming factors; parent material, climate, landscape, organisms and time. This study was designed to examine the effects of landscape and organisms (vegetation) on carbon (C) in Conservation Reserve Program (CRP), restored grasslands, and undisturbed grasslands across the northern Great Plains of the U.S. using statistical methods. The effects of vegetation, slope, and aspect on C sequestered in the surface 30 cm of the soil for 997 sites sampled across portions oflowa, Minnesota, Montana, and North and South Dakota were evaluated. A Partial F-test was used to evaluate models to determine the significance of factors and their interaction effects. For the vegetation component of these models, cool season grasses with or without legumes showed higher levels of soil organic C than warm season grasses with or without legumes or mixed cool and warm season grass regimes. When slopes were evaluated, slopes less than 3 % showed higher levels of sequestered C than slopes greater than 3 %. Southern and western aspects showed higher soil C levels than other aspects.Item Optimizing Prediction Power of RNA-seq on Intrinsic Characteristics in Breast Cancer(North Dakota State University, 2022) Liu, YuanBreast cancer is the most common cancer in women worldwide, and accurate and early detection of breast cancer is vital in characterizing the disease. Transcriptomic expression is embedded abundant tumor and cell state information. However, selecting a good pipeline in applying mRNA expression is critical in downstream characteristics prediction. We designed a study that focused on determining the best combinations of preprocessing processes in predictions. We tested six normalization methods, two gene selection methods, and over ten machine learning algorithms. By using appropriate evaluation metrics, we recommend using FPKM normalization method combined with either gene selection method and employing RF for the purpose of breast cancer downstream prediction.Item A Pilot Study of Module Interconnectedness(North Dakota State University, 2010) Vanguru, PrasanthComplexity plays an important role in understanding and working with a program, and has been measured in many different ways for software applications. The use of statistical analysis is one of the ways to predict the pattern of complexity among the modules present in a software application. A random sample of twelve software applications was selected for this study to examine complexity. A single pair of complexity measures was evaluated. This pair of complexity measures was the indegrees and out-degrees for each module of an application. The next step was to try to fit suitable statistical distributions to the in-degrees and to the out-degrees. By using various statistical distributions such as the normal, log-normal, exponential, geometric, uniform, poisson and the chi-square, we try to determine the type of distribution for the in-degrees and the type of distribution for out-degrees of the modules present in the software applications so that the pattern of complexity can be derived. The chi-square goodness of fit test was used to test various null hypotheses about the distributions for the in-degrees and for the out-degrees. Results showed that the pattern of in-degrees and the pattern of out-degrees both followed chi-square distributions.Item Robust Tests for Cointegration with Application to Statistical Arbitrage Trading Strategies(North Dakota State University, 2010) Hanson, Thomas AlanThis study proposes two new cointegration tests that employ rank-based and least absolute deviation techniques to create a robust version of the Engle-Granger cointegration test. Critical values are generated through a Monte Carlo simulation over a range of error distributions, and the performance of the tests is then compared against the Engle-Granger and Johansen tests. The robust procedures underperform slightly for normally distributed error terms but outperform for fatter-tailed distributions. This characteristic suggests the robust tests are more appropriate for many applications where departures from normality are common. One particular example discussed here is statistical arbitrage, a stock trading strategy based on cointegration and mean reversion. In a simple example, the rank-based procedure produces additional profits over the Engle-Granger procedure.