Statistics
Permanent URI for this communityhdl:10365/32398
Research from the Department of Statistics. The department website may be found at https://www.ndsu.edu/statistics/
Proceedings for the annual Red River Valley Statistical Conferences may be found at http://hdl.handle.net/10365/26113
Browse
Browsing Statistics by browse.metadata.type "Master's Paper"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Comparing Tests for a Mixed Design with Block Effect(North Dakota State University, 2009) Zhao, HuiTests Comb and Comb II are used to test the equality of means in a mixed design which is a combination of randomized complete block design and completely randomized design. The powers of Comb and Comb II for a mixed design have already been compared with Page's test (Magel, Terpstra, Wen (2009)) when there was little or no block effect added to the portion that was analyzed as a completely randomized design. In this paper, we wish to compare the tests when the portion of the design analyzed as a completely randomized design actually has a block effect. A Monte Carlo simulation study was conducted to compare the power of the three tests where Page's test was used only on data from the randomized complete block portion. A variety of situations were considered. Three underlying distributions were included in the simulation study. These included the normal distribution, exponential distribution, and t distribution with degree of freedom equal to 3. For every distribution, 16, 32 and 40 blocks were used in the randomized complete block design portion where the equal sample size of completely randomized data portion was 1/8, 1/4 and 1/2 the number of blocks considered. Unequal sample sizes for the completely randomized design portion were also considered. Powers were estimated for different location parameter arrangements for 3, 4 and 5 populations. Two variances, 0.25 and I, for the block effect were used. The block factor added into the completely randomized design portion didn't change the test with highest rejection percentage for the equal sample size cases, although the powers of the two tests for the mixed design decreased. For most of unequal sample size cases, Page's test has the highest rejection percentage. Overall, it was concluded that it was better to use one of the two tests for mixed design instead of Page's test when there were equal sample sizes for portion analyzed as a completely randomized design. When there were not equal size samples, but the first sample size was twice the size of the others, it was generally better to use Comb over Page's unless the number of populations became very large or there was a large block effect variance.Item A Comparison of the Ansari-Bradley Test and the Moses Test for the Variances(North Dakota State University, 2011) Yuni, ChenThis paper is aimed to compare the powers and significance levels of two well known nonparametric tests: the Ansari-Bradley test and the Moses test in both situations where the equal-median assumption is satisfied and where the equal-median assumption is violated. R-code is used to generate the random data from several distributions: the normal distribution, the exponential distribution, and the t-distribution with three degrees of freedom. The power and significance level of each test was estimated for a given situation based on 10,000 iterations. Situations with the equal samples of size 10, 20, and 30, and unequal samples of size 10 and 20, 20 and 10, and 20 and 30 were considered for a variety of different location parameter shifts. The study shows that when two location parameters are equal, generally the Ansari-Bradley test is more powerful than the Moses test regardless ofthe underlying distribution; when two location parameters are different, the Moses is generally preferred. The study also shows that when the underlying distribution is symmetric, the Moses test with large subset size k generally has higher power than the test with smaller k; when the underlying distribution is not symmetric, the Moses test with larger k is more powerful for relatively small sample sizes and the Moses test with medium k has higher power for relatively large sample sizes.Item Factors Influencing Carbon Sequestration in Northern Great Plains Grasslands(North Dakota State University, 2011) AnnamSoil development is influenced by the five soil forming factors; parent material, climate, landscape, organisms and time. This study was designed to examine the effects of landscape and organisms (vegetation) on carbon (C) in Conservation Reserve Program (CRP), restored grasslands, and undisturbed grasslands across the northern Great Plains of the U.S. using statistical methods. The effects of vegetation, slope, and aspect on C sequestered in the surface 30 cm of the soil for 997 sites sampled across portions oflowa, Minnesota, Montana, and North and South Dakota were evaluated. A Partial F-test was used to evaluate models to determine the significance of factors and their interaction effects. For the vegetation component of these models, cool season grasses with or without legumes showed higher levels of soil organic C than warm season grasses with or without legumes or mixed cool and warm season grass regimes. When slopes were evaluated, slopes less than 3 % showed higher levels of sequestered C than slopes greater than 3 %. Southern and western aspects showed higher soil C levels than other aspects.Item Optimizing Prediction Power of RNA-seq on Intrinsic Characteristics in Breast Cancer(North Dakota State University, 2022) Liu, YuanBreast cancer is the most common cancer in women worldwide, and accurate and early detection of breast cancer is vital in characterizing the disease. Transcriptomic expression is embedded abundant tumor and cell state information. However, selecting a good pipeline in applying mRNA expression is critical in downstream characteristics prediction. We designed a study that focused on determining the best combinations of preprocessing processes in predictions. We tested six normalization methods, two gene selection methods, and over ten machine learning algorithms. By using appropriate evaluation metrics, we recommend using FPKM normalization method combined with either gene selection method and employing RF for the purpose of breast cancer downstream prediction.Item A Pilot Study of Module Interconnectedness(North Dakota State University, 2010) Vanguru, PrasanthComplexity plays an important role in understanding and working with a program, and has been measured in many different ways for software applications. The use of statistical analysis is one of the ways to predict the pattern of complexity among the modules present in a software application. A random sample of twelve software applications was selected for this study to examine complexity. A single pair of complexity measures was evaluated. This pair of complexity measures was the indegrees and out-degrees for each module of an application. The next step was to try to fit suitable statistical distributions to the in-degrees and to the out-degrees. By using various statistical distributions such as the normal, log-normal, exponential, geometric, uniform, poisson and the chi-square, we try to determine the type of distribution for the in-degrees and the type of distribution for out-degrees of the modules present in the software applications so that the pattern of complexity can be derived. The chi-square goodness of fit test was used to test various null hypotheses about the distributions for the in-degrees and for the out-degrees. Results showed that the pattern of in-degrees and the pattern of out-degrees both followed chi-square distributions.Item Robust Tests for Cointegration with Application to Statistical Arbitrage Trading Strategies(North Dakota State University, 2010) Hanson, Thomas AlanThis study proposes two new cointegration tests that employ rank-based and least absolute deviation techniques to create a robust version of the Engle-Granger cointegration test. Critical values are generated through a Monte Carlo simulation over a range of error distributions, and the performance of the tests is then compared against the Engle-Granger and Johansen tests. The robust procedures underperform slightly for normally distributed error terms but outperform for fatter-tailed distributions. This characteristic suggests the robust tests are more appropriate for many applications where departures from normality are common. One particular example discussed here is statistical arbitrage, a stock trading strategy based on cointegration and mean reversion. In a simple example, the rank-based procedure produces additional profits over the Engle-Granger procedure.