Statistics
Permanent URI for this communityhdl:10365/32398
Research from the Department of Statistics. The department website may be found at https://www.ndsu.edu/statistics/
Proceedings for the annual Red River Valley Statistical Conferences may be found at http://hdl.handle.net/10365/26113
Browse
Browsing Statistics by browse.metadata.type "Master's paper"
Now showing 1 - 18 of 18
- Results Per Page
- Sort Options
Item An Application of Simplicial Intercept Depth (SID) Method for Fitting Linear Models(North Dakota State University, 2014) Sun, ZhongxingThis paper presents an application based on the Simplicial Intercept Depth method introduced by Liu (2004). We use this method to get the best linear fit of the phenotypic data for spot blotch resistant reaction of two different barley groups. The Simplicial Intercept Depth method is generalized by Simplicial Depth, also proposed by Liu in 1990. It provides a robust way for data analysis when outliers appear. In this paper, we use the Bootstrapping method, which is introduced by Bradley Efron (1979), to resample from the original dataset to get a distribution of the estimates. We also compare the SID with least squares regression and the Theil-type estimate which introduced by Shen (2009). The result shows that the SID is a robust method for estimating the coefficients of the linear regression model.Item Assessing Changes in Within Individual Variation Over Time for Nutritional Intake Data Using 24 Hour Recalls from the National Health and Examination Survey(North Dakota State University, 2012) Brandt, Kyal ScottNutritional surveys often use 24 hour recalls to assess the nutritional intake of certain populations. The National Health and Examination Survey (NHANES) collects two 24-hour recalls for each individual in the study. This small sampling can lead to a great deal of variation due to day-to-day differences in an individual’s intake, making it difficult to assess “usual intake.” The ISU method is implemented in the PC-Side software package, breaking our observed variation into two components: within individual variation (WIV) and between-individual variation (BIV). In this paper, we will use the PC-Side software to get WIV estimates for several different age, gender, and nutrients from NHANES nutrition data. We will look at how WIV estimates change over time and using past WIV estimates to get a “usual intake” distribution and the calculated proportion below an estimated average requirement (EAR).Item Bracketing NCAA Men's Division I Basketball Tournament(North Dakota State University, 2013) Zhang, XiaoThis paper presents a new bracketing method for all 63 games in the NCAA Division 1 basketball tournament. This method, based on the logistic conditional probability models, is self-consistent in terms of constructing winning probabilities of each game. Empirical results show that this method outperforms the ordinal logistic regression and expectation method with restriction(Restricted OLRE model) proposed by West (2006).Item Clustering Algorithm Comparison for Ellipsoidal Data(North Dakota State University, 2015) Loeffler, Shane RobertThe main objective of cluster analysis is the statistical technique of identifying data points and assigning them into meaningful clusters. The purpose of this paper is to compare different types of clustering algorithms to find the clustering algorithm that performs the best for varying complexities in Gaussian data. The clustering algorithms used would include: Partitioning Around Medoids (PAM), K-means, Hierarchical with different linkages (Ward’s linkage, Single linkage, Complete linkage, Average linkage, McQuitty’s method, Gower’s method, and Centroid method). The different types of complexities would include different number of dimensions, average pairwise overlap between clusters, number of points simulated from each cluster. After the data is simulated the Adjusted Rand Index will be used gauge the performance of the clusters. From that a t-test will also be used to see if there are any clustering algorithms that as well as other clustering algorithms.Item Comparison of Classification Rates among Logistic Regression, Neural Network and Support Vector Machines in the Presence of Missing Data(North Dakota State University, 2014) Upadhyaya, SudhiStatistical models such as Logistic Regression (LR), Neural Network (NN) and Support Vector Machines (SVM) often use datasets with missing values while making inferences regarding the population. When inferences are made based on the data set used, the presence of missing data can severely skew the results and distort the efficiency of the model. Our objective was to identify a robust model among LR, NN, SVM in the presence of missing data. The study was conducted by simulating observations based on Monte Carlo methods and missing data was introduced randomly at 10% level. Single mode imputation was used to impute missing values. Simple random samples of 120, 240 and 500 observations were chosen and these three models were fit for two scenarios. Results showed that the performance of SVM was far superior compared to LR or NN models. However, the classification accuracy of SVM gradually decreased as sample size increased.Item Comparison of Proposed K Sample Tests with Dietz's Test For Nondecreasing Ordered Alternatives for Bivariate Exponential Data(North Dakota State University, 2011) Pothana, JyothsnadeviComparison of powers is essential to determine the best test that can be used for data under certain specific conditions. Likewise, several nonparametric methods have been developed for testing the ordered alternatives. The Jonckheere-Terpstra (JT) test and the Modified Jonckheere-Terpstra (MJT) test are for testing nondecreasing ordered alternatives for univariate data. The Dietz test is for testing nondecreasing alternatives based on bivariate data. This paper compares various tests when testing for nondecreasing alternatives specifically when the underlying distributions are bivariate exponential. The JT test and the MJT test are applied to univariate data which is derived by reducing bivariate data to univariate data using various transformations. A Monte Carlo simulation study is conducted comparing the estimated powers of JT tests and MJT tests (based on a variety of transformed univariate data) with the estimated powers of Dietz test (based on bivariate data) under a variety of location shifts and sample sizes. The results are compared with Zhao' s (2011) results for bivariate normal data. The overall best test statistic for bivariate data ordered alternatives is discussed in this paper.Item Ds-Optimal Design for Model Discrimination in a Probit Model(North Dakota State University, 2014) Liu, RuifengIn toxicology studies, dose response functions with a downturn at higher doses are often observed. For such response functions, researchers often want to see if the downturn of the response is signifcant. A probit model with a quadratic term is adopted to demonstrate the dose response with a downturn. Under the probit model, we obtain optimal designs to study the signifcance of the downturn and their efficiencies are compared. Our approach identites the upper bound of the number of optimal design points and searches for the optimal design numerically based on the upper bound.Item Exploring Associations between Lifestyles and Metabolic Syndrome in Middle-Age Chinese Population(North Dakota State University, 2018) Zhou, XiaoyiNowadays the prevalence of Metabolic Syndrome (MetS) affects many middle-age people in China. MetS is associated with the risk of type 2 diabetes and cardiovascular disease. Identifying the potential risk factors contribute to MetS is very important for preventing cardiovascular disease. The associations between lifestyles and prevalence of MetS are extensively studied by researchers. A cross-sectional study, which was conducted by Strand, MA. surveyed 659 subjects in Yuci, China in 2012. The proportional odds model was applied to determine the associations between lifestyles and MetS in three Chinese middle-age groups. The results demonstrated that doing daily exercise was one of the best method to treat MetS. Moderate alcohol consumption could prevent MetS in age group born in 1956. Occasionally milk consumption could prevent MetS in age group born in 1964, while it did not help age groups born in 1960-1961 and in 1956.Item Forecasting Point Spread for Women’s Volleyball(North Dakota State University, 2016) Zhang, DelingVolleyball has become a well-known and competitive sport with physical and technical performances over the years. The game results are determined by some important factors such as players, and the team’s skills to succeed in a championship. In this research, we propose to analyze volleyball data by using a multiple linear regression model and a logistic regression model. We develop a multiple regression model using in-game statistics that explain the point spread of a volleyball game. We also develop a logistic regression model that estimates the probability of a team winning the game based on the in-game statistics. Both of the models are validated and then the point spread model is used to predict the results of a volleyball game replacing the in-game statistics with the averages of the in-game statistics based on the past two previous matches of both teams. Results are given.Item Identifying Significant Factors Influencing Metabolic Syndrome In China(North Dakota State University, 2015) Gu, XiaoxueMetabolic Syndrome occurs when a person’s body does not properly use and store energy. The disease has five criteria: abdominal obesity, insulin resistance, hypertension, dyslipidemia, and impaired glucose regulation. The purpose of this paper was to analysis a longitudinal data obtained from China. The data was collected using surveys in 2008 and 2012. For finding the factors that contributed significantly to the development of Metabolic Syndrome, a marginal model was applied. To fit the marginal model, the Generalized Estimating Equation method was used. The developed model did not have high accuracy of presenting the proportion of true results ( Metabolic Syndrome observed and no Metabolic Syndrome observed).Item The Influence of Race, Age, Comorbidities, and BMI on Disability Following Stroke in Elderly People Living in Their Own Home(North Dakota State University, 2020) Endo, IzumiStroke is one of the major health issues in the United States. I explored different aspects of disability based on a history of stroke, race, comorbidities, age, and body mass index for the population of community dwelling stroke survivors. Using a dataset drawn from the first wave of the longitudinal study of the National Social Life, Health, and Aging Project (Waite et al., 2019), analysis was performed. The dataset consists of a nationally representative sample of 3,005 community dwelling people between the ages of 57 to 86 years old at the time of recruitment. The results demonstrated that the history of stroke, presence of comorbidities such as arthritis, chronic obstructive pulmonary disease, asthma, and heart failure, age, and body mass index significantly influenced the amount of disability an elderly person had. Performing screening and addressing the issues are essential to lower the amount of disability in the elderly population.Item Investment Behavior Analysis Based on Tail Risk Management(North Dakota State University, 2018) Sun, YuAs behavioral finance is becoming more prevalent in academic area, a study is worth conducting to pinpoint investors’ preference through managing tail risk of asset portfolios. This study investigates investors’ investment behaviors by modeling their investment personalities based on tail risk management. We incorporate CVaR approach to model traditional and non-traditional investment behaviors by reshaping the tails of portfolio return. To be specific, we build model to maximize left-tail CVaR, minimize right-tail CVaR, minimize left-tail CVaR models, and a mixed model that maximize left-tail CVaR and minimize righttail CVaR simultaneously based on various group of rational and irrational investors. Our work incorporates empirical historical data and Monte Carlo simulation to compare these models with the classical Markowitz approach via different dimensions. We make contributions to fill the gap by making a more comprehensively study that incorporates investors’ psychological factors and exploring economic information regarding asset pricing puzzle and long-run risk.Item Optimal Designs for the Hill Model with Three Parameters(North Dakota State University, 2012) Dockter, Travis JonOptimal designs specify design points to use and how to distribute subjects over these design points in the most efficient manner. The Hill model with three parameters is often used to describe sigmoid dose response functions. In our paper, we study optimal designs under the Hill model. The first is D-optimal design, which works best to study the model to fit the data. Next is c-optimal design, which works best to study a target dose level, such as ED50 - the dose level with 50% maximum treatment effect. The third is a two-stage optimal design, which considers both D-optimality and c-optimality. In order to compare the optimal designs, their design efficiencies are compared.Item Prediction of the World Cup Soccer Winner: Using Two Statistical Methods(North Dakota State University, 2016) Sylla, Mohamed Dit ModySoccer is considered the most popular sport on earth and applying statistical models to analyze small soccer data has been of a keen interest to modern researchers. Statistical modeling of soccer data also provides guidance and assistance to stakeholders. The goal of this paper is to establish a consistent statistical approach to help in the prediction of future World Cup championships. Ordinary least squares regression is used to develop models which predict goal margin of games and logistic regression is used to develop models which estimate the probability of a team winning the game. Discriminant Analysis was also used to determine which variables significantly influence individual game wins. The Fisher classification procedure allows for interpretability while providing a robust approach to classifying the 32 contestants of the 2014 World Cup using the previous data from 2006 and 2010 World Cup Championships.Item A Proposed Nonparametric Test for Simple Tree Alternative in a BIBD Design(North Dakota State University, 2011) Wang, ZhuangliA nonparametric test is proposed to test for the simple tree alternative in a Balanced Incomplete Block Design (BIBD). The details of the test statistic when the null hypothesis is true are given. The paper also introduces the calculations of the means and variances under a variety of situations. A Monte Carlo simulation study based on SAS is conducted to compare the powers of the new proposed test and the Durbin test. The simulation study is used to generate the BIBD data from three distributions: the normal distribution, the exponential distribution, and the Student's t distribution with three degrees of freedom. The powers of the proposed test and the Durbin test are both estimated based on 10,000 iterations for three, four, and five treatments, and for different location shifts. According to the results of simulation study, the Durbin test is better when at least one treatment mean is close to or equal to the control mean: otherwise, the proposed test is better.Item Robust D-Optimal Design for Multiple Nominal Parameter Values under the 5PL-1P Model(North Dakota State University, 2018) Liang, CuipingA robust D-optimal design that works well for multiple nominal parameter values is presented in this paper. In general, D-optimal design works very well for estimating the model parameters, but it is very sensitive to multiple nominal model parameter values when the response is modeled by nonlinear models. The 5PL-1P model is considered in this study to describe a dose-response function. The sensitivity of the D-optimal design to the model parameter values under the 5PL-1P model is studied. The robust D-optimal design that can reduce the impact of the multiple nominal model parameter values is proposed using the Bayesian technique. Lastly, we compare performances of the proposed design to other well-known designs for estimating the model parameters under the 5PL-1P model.Item Robust D-Optimal Design for Response Functions with a Downturn(North Dakota State University, 2013) Carter, Jessica AnneResearchers studying dose-response relationships must allocate limited resources to design points in order to maximize the information gained from the study. D-optimal design is a well-described design that works efficiently to study model parameters. In order to find the D-optimal design, the model that describes the dose-response relationship has to be known. In cases where dose-response relationships show a downturn at high doses, scientists sometimes ignore the downturn to study only the increasing part of the response curve. Here we have two model choices; one describes the overall dose-response relationship, and the other describes only the increasing part of the response curve. The D-optimal designs for these two models will be different and the D-optimal design for one model may not work efficiently for the other model. This research studies robust D-optimal design, a design that works efficiently for both models.Item Rutin Extraction and Content in Buckwheat (Fagopyrum esculentum) Bran-Fortified Pasta(North Dakota State University, 2019) Kaiser, Amber ChristineThe objectives of this study were to optimize extraction of rutin from buckwheat bran and buckwheat bran-fortified spaghetti and to determine the stability of rutin during spaghetti production and preparation. Aqueous ethanol and ethanol at 50, 60, 70, 80, and 90 % were used with Soxhlet or ultrasound-assisted extraction methods and 80 % methanol extraction was evaluated with or without papain treatment. Optimal extraction treatment (80 % methanol using ultrasound-assisted extraction without enzyme treatment) was used to determine rutin content in buckwheat bran-fortified spaghetti dried at low (40 °C) or high (90 °C) temperature. Rutin content was evaluated in raw, hydrated, extruded, dried, and cooked pasta. High temperature drying reduced rutin content more than low temperature drying, and total reduction in rutin content from raw pasta mix to cooked pasta was 25 – 30 %.