NDSU Theses & Dissertations
Permanent URI for this communityhdl:10365/26050
Research performed to achieve a formal degree from NDSU. Includes theses, dissertations, master's papers, and videos. The Libraries are currently undertaking a scanning project to include all bound student theses, dissertations, and masters papers.
Browse
Browsing NDSU Theses & Dissertations by browse.metadata.program "Applied Statistics"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Item Analyzing and Controlling Biases in Student Rating of Instruction(North Dakota State University, 2019) Zhou, YueMany colleges and universities have adopted the student ratings of instruction (SROI) system as one of the measures for instructional effectiveness. This study aims to establish a predictive model and address two questions related to SROI: firstly, whether gender bias against female instructors at North Dakota State University (NDSU) exists and, secondly, how other factors related to students, instructors and courses affect the SROI. In total, 30,303 SROI from seven colleges at NDSU for the 2013-2014 academic year are studied. Our results demonstrate that there is a significant association between students’ gender and instructors’ gender in the rating scores. Therefore, we cannot determine how the gender of an instructor effects the course rating unless we know the composition of genders of students in that class. Predictive proportional odds models for the students’ ordinal categorical ratings are established.Item Ds-Optimal Design for Model Discrimination in a Probit Model(North Dakota State University, 2014) Liu, RuifengIn toxicology studies, dose response functions with a downturn at higher doses are often observed. For such response functions, researchers often want to see if the downturn of the response is signifcant. A probit model with a quadratic term is adopted to demonstrate the dose response with a downturn. Under the probit model, we obtain optimal designs to study the signifcance of the downturn and their efficiencies are compared. Our approach identites the upper bound of the number of optimal design points and searches for the optimal design numerically based on the upper bound.Item Empirical Study of Two Hypothesis Test Methods for Community Structure in Networks(North Dakota State University, 2019) Nan, YehongMany real-world network data can be formulated as graphs, where a binary relation exists between nodes. One of the fundamental problems in network data analysis is community detection, clustering the nodes into different groups. Statistically, this problem can be formulated as hypothesis testing: under the null hypothesis, there is no community structure, while under the alternative hypothesis, community structure exists. One is of the method is to use the largest eigenvalues of the scaled adjacency matrix proposed by Bickel and Sarkar (2016), which works for dense graph. Another one is the subgraph counting method proposed by Gao and Lafferty (2017a), valid for sparse network. In this paper, firstly, we empirically study the BS or GL methods to see whether either of them works for moderately sparse network; secondly, we propose a subsampling method to reduce the computation of the BS method and run simulations to evaluate the performance.Item Entropy as a Criterion for Variable Reduction in Cluster Data(North Dakota State University, 2012) Olson, ChristopherEntropy is a measure of the randomness of a system state. This quantity gives us a measure of uncertainty that is associated with each particular observation belonging to a specific cluster. We examine this property and its potential use in analyzing high dimension datasets. Entropy proves most interesting in identifying possible dimensions that do not contribute meaningful classification to the clusters present. We can remove the dimension(s) found which are the least important and generalize this idea to a procedure. After identifying all the dimensions that should be eliminated from the dataset, we then compare its ability in recovering the true classification of the observations versus the estimated classification of the data. From the results obtained and shown in this paper, it is clear that entropy is a good candidate for a criterion in variable reduction.Item Forecasting Batter Performance using Statcast Data in Major League Baseball(North Dakota State University, 2017) Taylor, Nicholas Christopher2015 saw the release of the Statcast camera system within Major League Baseball ballparks, which provided statisticians with new data to analyze. One statistic, average exit velocity, is of particular interest. We would like to see if a batter’s average exit velocity can significantly explain the variation in his slugging percentage and batting average on balls in play (BABIP) when taken into account with other, more traditional baseball statistics. These two statistics are of particular interest within advanced baseball data analysis. We found that a player’s average exit velocity can significantly explain the variation in both his slugging percentage and his BABIP. We also discovered that the significance is stronger in explaining slugging percentage than in explaining BABIP.Item Forecasting Point Spread for Women’s Volleyball(North Dakota State University, 2016) Zhang, DelingVolleyball has become a well-known and competitive sport with physical and technical performances over the years. The game results are determined by some important factors such as players, and the team’s skills to succeed in a championship. In this research, we propose to analyze volleyball data by using a multiple linear regression model and a logistic regression model. We develop a multiple regression model using in-game statistics that explain the point spread of a volleyball game. We also develop a logistic regression model that estimates the probability of a team winning the game based on the in-game statistics. Both of the models are validated and then the point spread model is used to predict the results of a volleyball game replacing the in-game statistics with the averages of the in-game statistics based on the past two previous matches of both teams. Results are given.Item A Model to Predict Matriculation of Concordia College Applicants(North Dakota State University, 2017) Pavlik, KaylinColleges and universities are under mounting pressure to meet enrollment goals in the face of declining college attendance. Insight into student-level probability of enrollment, as well as the identification of features relevant in student enrollment decisions, would assist in the allocation of marketing and recruitment resources and the development of future yield programs. A logistic regression model was fit to predict which applicants will ultimately matriculate (enroll) at Concordia College. Demographic, geodemographic and behavioral features were used to build a logistic regression model to assign probability of enrollment to each applicant. Behaviors indicating interest (campus visits, submitting a deposit) and residing in a zip code with high alumni density were found to be strong predictors of matriculation. The model was fit to minimize false negative rate, which was limited to 18.1 percent, compared to 50-60 percent reported by comparable studies. Overall, the model was 80.13 percent accurate.Item Rutin Extraction and Content in Buckwheat (Fagopyrum esculentum) Bran-Fortified Pasta(North Dakota State University, 2019) Kaiser, Amber ChristineThe objectives of this study were to optimize extraction of rutin from buckwheat bran and buckwheat bran-fortified spaghetti and to determine the stability of rutin during spaghetti production and preparation. Aqueous ethanol and ethanol at 50, 60, 70, 80, and 90 % were used with Soxhlet or ultrasound-assisted extraction methods and 80 % methanol extraction was evaluated with or without papain treatment. Optimal extraction treatment (80 % methanol using ultrasound-assisted extraction without enzyme treatment) was used to determine rutin content in buckwheat bran-fortified spaghetti dried at low (40 °C) or high (90 °C) temperature. Rutin content was evaluated in raw, hydrated, extruded, dried, and cooked pasta. High temperature drying reduced rutin content more than low temperature drying, and total reduction in rutin content from raw pasta mix to cooked pasta was 25 – 30 %.Item Testing Parallelism for the Four-Parameter Logistic Model with D-Optimal Design(North Dakota State University, 2018) Lin, YingIn order to determine the potency of the test preparation relative to the standard preparation, it is often important to test parallelism between a pair of dose-response curves of reference standard and test sample. Optimal designs are known to be more powerful in testing parallelism as compared to classical designs. In this study, D-optimal design was implemented to study the parallelism and compare its performance with a classical design. We modified Doptimal design to test the parallelism in the four-parameter logistic (4PL) model using Intersection-Union Test (IUT). IUT method is appropriate when the null hypothesis is expressed as a union of sets, and by using this method complicated tests involving several parameters are easily constructed. Since D-optimal design minimizes the variances of model parameters, it can bring more power to the IUT test. A simulation study will be presented to compare the empirical properties of the two different designs.