65 results
Search Results
Now showing 1 - 10 of 65
Item Predicting Outcomes of NBA Basketball Games(North Dakota State University, 2016) Jones, Eric ScotA stratified random sample of 144 NBA basketball games was taken over a three-year period, between 2008 and 2011. Models were developed to predict point spread and to estimate the probability of a specific team winning based on various in-game statistics. Statistics significant in the model were field-goal shooting percentage, three-point shooting percentage, free-throw shooting percentage, offensive rebounds, assists, turnovers, and free-throws attempted. Models were verified using exact in-game statistics for a random sample of 50 NBA games taken during the 2011-2012 season with 88-94% accuracy. Three methods were used to estimate in-game statistics of future games so that the models could be used to predict a winner in games played by Team A and Team B. Models using these methods had accuracies of approximately 62%. Seasonal averages for these in-game statistics were used in the model developed to predict the winner of each game for the 2013-2016 NBA Championships.Item A Visualization Technique for Course Evaluations and Other Likert Scale Data(North Dakota State University, 2018) Saho, MuhammedCourse evaluation is one of the primary ways of collecting feedback from students at NDSU. Since almost every student in every course submits one at the end of the semester, it generates a lot of data. The data is summarized into text based reports with emphasis on average rating of each question. At one page per course, analyzing these reports can be overwhelming. Furthermore, it is very difficult to identify patterns in the text reports. We combine heat maps and small multiples to introduce a visualization of the data that allows for easier comparison between courses, departments, etc. We defined a data format for storing and transmitting the data. We built an interactive web application that consumes the aforementioned data format and generates the visualizations. We simulated reference data to facilitate interpretation of the visualizations. Finally, we discussed how our research can be applied more generally to Likert scale data.Item Comparing Total Hip Replacement Drug Treatments for Cost and Length of Stay(North Dakota State University, 2015) Huebner, Blake JamesThe objective of this study is to identify the potential effect anticoagulants, spinal blocks, and antifibrinolytics have on overall cost, length of stay, and re-admission rates for total hip replacement patients. We use ordinary least squares regression, multiple comparison testing, logistic regression, and chi square tests to fulfill this objective. The combination of warfarin and enoxaparin is associated with the highest cost and length of stay out of the anticoagulants studied. There is no clear combination of spinal blocks associated with the highest cost and length of stay. Tranexamic acid is associated with a reduction in length of stay and likelihood of receiving a blood transfusion, while not increasing overall cost. No drug combination in any category is associated with a change in re-admission rates.Item Using Imputed Microrna Regulation Based on Weighted Ranked Expression and Putative Microrna Targets and Analysis of Variance to Select Micrornas for Predicting Prostate Cancer Recurrence(North Dakota State University, 2014) Wang, QiImputed microRNA regulation based on weighted ranked expression and putative microRNA targets (IMRE) is a method to predict microRNA regulation from genome-wide gene expression. A false discovery rate (FDR) for each microRNA is calculated using the expression of the microRNA putative targets to analyze the regulation between different conditions. FDR is calculated to identify the differences of gene expression. The dataset used in this research is the microarray gene expression of 596 patients with prostate cancer. This dataset includes three different phenotypes: PSA (Prostate-Specific Antigen recurrence), Systemic (Systemic Disease Progression) and NED (No Evidence of Disease). We used the IMRE and ANOVA methods to analyze the dataset and identified several microRNA candidates that can be used to predict PSA recurrence and systemic disease progression in prostate cancer patients.Item Information Asymmetry in Budget Allocation: An Analysis of the Truth-Inducing Incentive Scheme(North Dakota State University, 2016) Zhou, YunTruth-inducing incentive schemes are used to motivate project managers to provide unbiased project information to portfolio manager to reduce information asymmetry between portfolio manager and project managers. To improve the scheme, we identify the proper value of penalty coefficients in the truth-inducing incentive scheme when information asymmetry is present. We first describe the allocation method that achieves budget optimization under certain assumptions and identify the proper coefficients while accounting for the differing perceptions of both portfolio manager and project managers. We report a bound on the ratio between the two penalty coefficients in the truth-inducing incentive scheme and then we conduct a simulation study to narrow down the bound. We conclude that the penalty coefficient for being over budget should be reduced when the portfolio budget is tight and the penalty coefficients should be equivalent to the organizational opportunity costs when the portfolio budget is sufficient.Item Mass Spectrum Analysis of a Substance Sample Placed into Liquid Solution(North Dakota State University, 2011) Wang, YunliMass spectrometry is an analytical technique commonly used for determining elemental composition in a substance sample. For this purpose, the sample is placed into some liquid solution called liquid matrix. Unfortunately, the spectrum of the sample is not observable separate from that of the solution. Thus, it is desired to distinguish the sample spectrum. The analysis is usually based on the comparison of the mixed spectrum with the one of the sole solution. Introducing the missing information about the origin of observed spectrum peaks, the author obtains a classic set up for the Expectation-Maximization (EM) algorithm. The author proposed a mixture modeling the spectrum of the liquid solution as well as that of the sample. A bell-shaped probability mass function obtained by discretization of the univariate Gaussian probability density function was proposed or serving as a mixture component. The E- and M- steps were derived under the proposed model. The corresponding R program is written and tested on a small but challenging simulation example. Varying the number of mixture components for the liquid matrix and sample, the author found the correct model according to Bayesian Information Criterion. The initialization of the EM algorithm is a difficult standalone problem that was successfully resolved for this case. The author presents the findings and provides results from the simulation example as well as corresponding illustrations supporting the conclusions.Item Comparison of Proposed K Sample Tests with Dietz's Test for Nondecreasing Ordered Alternatives for Bivariate Normal Data(North Dakota State University, 2011) Zhao, YanchunThere are many situations in which researchers want to consider a set of response variables simultaneously rather than just one response variable. For instance, a possible example is when a researcher wishes to determine the effects of an exercise and diet program on both the cholesterol levels and the weights of obese subjects. Dietz (1989) proposed two multivariate generalizations of the Jonckheere test for ordered alternatives. In this study, we propose k-sample tests for nondecreasing ordered alternatives for bivariate normal data and compare their powers with Dietz's sum statistic. The proposed k-sample tests are based on transformations of bivariate data to univariate data. The transformations considered are the sum, maximum and minimum functions. The ideas for these transformations come from the Leconte, Moreau, and Lellouch (1994). After the underlying bivariate normal data are reduced to univariate data, the Jonckheere-Terpstra (JT) test (Terpstra, 1952 and Jonckheere, 1954) and the Modified Jonckheere-Terpstra (MJT) test (Tryon and Hettmansperger, 1973) are applied to the univariate data. A simulation study is conducted to compare the proposed tests with Dietz's test for k bivariate normal populations (k=3, 4, 5). A variety of sample sizes and various location shifts are considered in this study. Two different correlations are used for the bivariate normal distributions. The simulation results show that generally the Dietz test performs the best for the situations considered with the underlying bivariate normal distribution. The estimated powers of MJT sum and JT sum are often close with the MJT sum generally having a little higher power. The sum transformation was the best of the three transformations to use for bivariate normal data.Item Power Analysis to Determine the Importance of Covariance Structure Choice in Mixed Model Repeated Measures Anova(North Dakota State University, 2017) King, Taylor J.Repeated measures experiments involve multiple subjects with measurements taken on each subject over time. We used SAS to conduct a simulation study to see how different methods of analysis perform under various simulation parameters (e.g. sample size, autocorrelation, repeated measures). Our goals were to: compare the multivariate analysis of variance method using PROC GLM to the mixed model method using PROC MIXED in terms of power, determine how choosing the incorrect covariance structure for mixed model analysis affects power, and identify sample sizes needed to produce adequate power of 90 percent under different scenarios. The findings support using the mixed model method over the multivariate method because power is generally higher when using the mixed model method. Simpler covariance structures may be preferred when testing the within-subjects effect to obtain high power. Additionally, these results can be used as a guide for determining the sample size needed for adequate power.Item Examining Influential Factors and Predicting Outcomes in European Soccer Games(North Dakota State University, 2013) Melnykov, YanaModels are developed using least squares regression and logistic regression to predict outcomes of European soccer games based on four variables related to the past k games of each team playing with the following values of k considered: 4, 6, 8, 10, and 12. Soccer games from the European soccer leagues of England, Italy, and Spain are considered for the 2011-2012 year. Each league has 20 teams playing two games with each other: one game is played at home; the other game is played away. There are 38 rounds in each league. The first 33 rounds are used to developed models to predict outcomes of games. Predictions are made for the last 5 rounds in each league. We were able to correctly predict 76% of the results for the last 5 rounds using the linear regression model and 77% of results correctly using the logistic regression model.Item Investigating Statistical vs. Practical Significance of the Kolmogorov-Smirnov Two-Sample Test Using Power Simulations and Resampling Procedures(North Dakota State University, 2018) Larson, Lincoln GaryThis research examines the power of the Kolmogorov-Smirnov two-sample test. The motivation for this research is a large data set containing soil salinity values. One problem encountered was that the power of the Kolmogorov-Smirnov two-sample test became extremely high due to the large sample size. This extreme power resulted in statistically significant differences between two distributions when no practically significant difference was present. This research used resampling procedures to create simulated null distributions for the test statistic. These null distributions were used to obtain power approximations for the Kolmogorov-Smirnov tests under differing effect sizes. The research shows that the power of the Kolmogorov-Smirnov test can become very large in cases of large sample sizes.