Model Validation and Diagnostis in Right Censored Regression
Abstract
When censored data are present in the linear regression setting, the Expectation-Maximization (EM) algorithm and the Buckley and James (BJ) method are two algorithms that can be implemented to fit the regression model. We focus our study on the EM algorithm because it is easier to implement than the BJ algorithm and it uses common assumptions in regression theory, such as normally distributed errors. The BJ algorithm, however, is used for comparison purposes in benchmarking the EM parameter estimates, their variability, and model selection. In this dissertation, validation and influence diagnostic tools are proposed for right censored regression using the EM algorithm. These tools include a reconstructed coefficient of determination, a test for outliers based on the reconstructed Jackknife residual, and influence diagnostics with one-step deletion. To validate the proposed methods, extensive simulation studies are performed to compare the performances of the EM and BJ algorithms in parameter estimation for data with different error distributions, the proportion of censored data, and sample sizes. Sensitivity analysis for the reconstructed coefficient of determination is developed to show how the EM algorithm can be used in model validation for different amounts of censoring and locations of the censored data. Additional simulation studies show the capability of the EM algorithm to detect outliers for different types of outliers (uncensored and censored), proportions of censored data, and the locations of outliers. The proposed formula for the one-step deletion method is validated with an example and a simulation study. Additionally, this research proposes a novel application of the EM algorithm for modeling right censored regression in the area of actuarial science. Both the EM and BJ algorithms are utilized in modeling health benefit data provided by the North Dakota Department of Veterans Affairs (ND DVA). Proposed model validation and diagnostic tools are applied using the EM algorithm. Results of this study can be of great benefit to government policy makers and pricing actuaries.