Statistics Masters Theses
Permanent URI for this collectionhdl:10365/32401
Browse
Browsing Statistics Masters Theses by browse.metadata.program "Applied Statistics"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Analyzing and Controlling Biases in Student Rating of Instruction(North Dakota State University, 2019) Zhou, YueMany colleges and universities have adopted the student ratings of instruction (SROI) system as one of the measures for instructional effectiveness. This study aims to establish a predictive model and address two questions related to SROI: firstly, whether gender bias against female instructors at North Dakota State University (NDSU) exists and, secondly, how other factors related to students, instructors and courses affect the SROI. In total, 30,303 SROI from seven colleges at NDSU for the 2013-2014 academic year are studied. Our results demonstrate that there is a significant association between students’ gender and instructors’ gender in the rating scores. Therefore, we cannot determine how the gender of an instructor effects the course rating unless we know the composition of genders of students in that class. Predictive proportional odds models for the students’ ordinal categorical ratings are established.Item Empirical Study of Two Hypothesis Test Methods for Community Structure in Networks(North Dakota State University, 2019) Nan, YehongMany real-world network data can be formulated as graphs, where a binary relation exists between nodes. One of the fundamental problems in network data analysis is community detection, clustering the nodes into different groups. Statistically, this problem can be formulated as hypothesis testing: under the null hypothesis, there is no community structure, while under the alternative hypothesis, community structure exists. One is of the method is to use the largest eigenvalues of the scaled adjacency matrix proposed by Bickel and Sarkar (2016), which works for dense graph. Another one is the subgraph counting method proposed by Gao and Lafferty (2017a), valid for sparse network. In this paper, firstly, we empirically study the BS or GL methods to see whether either of them works for moderately sparse network; secondly, we propose a subsampling method to reduce the computation of the BS method and run simulations to evaluate the performance.Item Entropy as a Criterion for Variable Reduction in Cluster Data(North Dakota State University, 2012) Olson, ChristopherEntropy is a measure of the randomness of a system state. This quantity gives us a measure of uncertainty that is associated with each particular observation belonging to a specific cluster. We examine this property and its potential use in analyzing high dimension datasets. Entropy proves most interesting in identifying possible dimensions that do not contribute meaningful classification to the clusters present. We can remove the dimension(s) found which are the least important and generalize this idea to a procedure. After identifying all the dimensions that should be eliminated from the dataset, we then compare its ability in recovering the true classification of the observations versus the estimated classification of the data. From the results obtained and shown in this paper, it is clear that entropy is a good candidate for a criterion in variable reduction.Item Forecasting Batter Performance using Statcast Data in Major League Baseball(North Dakota State University, 2017) Taylor, Nicholas Christopher2015 saw the release of the Statcast camera system within Major League Baseball ballparks, which provided statisticians with new data to analyze. One statistic, average exit velocity, is of particular interest. We would like to see if a batter’s average exit velocity can significantly explain the variation in his slugging percentage and batting average on balls in play (BABIP) when taken into account with other, more traditional baseball statistics. These two statistics are of particular interest within advanced baseball data analysis. We found that a player’s average exit velocity can significantly explain the variation in both his slugging percentage and his BABIP. We also discovered that the significance is stronger in explaining slugging percentage than in explaining BABIP.Item A Model to Predict Matriculation of Concordia College Applicants(North Dakota State University, 2017) Pavlik, KaylinColleges and universities are under mounting pressure to meet enrollment goals in the face of declining college attendance. Insight into student-level probability of enrollment, as well as the identification of features relevant in student enrollment decisions, would assist in the allocation of marketing and recruitment resources and the development of future yield programs. A logistic regression model was fit to predict which applicants will ultimately matriculate (enroll) at Concordia College. Demographic, geodemographic and behavioral features were used to build a logistic regression model to assign probability of enrollment to each applicant. Behaviors indicating interest (campus visits, submitting a deposit) and residing in a zip code with high alumni density were found to be strong predictors of matriculation. The model was fit to minimize false negative rate, which was limited to 18.1 percent, compared to 50-60 percent reported by comparable studies. Overall, the model was 80.13 percent accurate.Item Testing Parallelism for the Four-Parameter Logistic Model with D-Optimal Design(North Dakota State University, 2018) Lin, YingIn order to determine the potency of the test preparation relative to the standard preparation, it is often important to test parallelism between a pair of dose-response curves of reference standard and test sample. Optimal designs are known to be more powerful in testing parallelism as compared to classical designs. In this study, D-optimal design was implemented to study the parallelism and compare its performance with a classical design. We modified Doptimal design to test the parallelism in the four-parameter logistic (4PL) model using Intersection-Union Test (IUT). IUT method is appropriate when the null hypothesis is expressed as a union of sets, and by using this method complicated tests involving several parameters are easily constructed. Since D-optimal design minimizes the variances of model parameters, it can bring more power to the IUT test. A simulation study will be presented to compare the empirical properties of the two different designs.