Improved Monte Carlo Simulation in the Presence of Outliers Using Labeling and Bayesian Averaging
Abstract
The problem of outliers is an old phenomenon in statistics, and it appears with surprising frequency in many datasets in both the natural and social sciences and can have both positive and negative effects on statistical analysis. Unlike the traditional approach to dealing with outliers in a dataset, this study considers both the base and contaminating distributions that generate outliers and estimates the best-fitting distribution for each separately. Using the natural conjugate prior distribution for the probability of occurrence, the ‘Bayesian averaging’ technique is used in a way that preserves most of the information in the total dataset. The KS-test and AD-test statistics were computed by contrasting the simulated to the actual data distribution to obtain the comparative metric. Analysis of seven sample datasets (each containing outliers) indicated that these alternate simulation procedures provided a stronger goodness-of-fit to the historical data when compared to other, more traditional approaches.