Building a Predictive Model on State of Good Repair by Machine Learning Algorithm on Public Transportation Rolling Stock
More InformationShow full item record
Building a Predictive Model on State of Good Repair by Machine Learning Algorithm on Public Transportation Rolling Stock.pdf (1.504Mb)
Achieving and maintaining public transportation rolling stocks in a state of good repair is very crucial to provide safe and reliable services to riders. Besides, transit agencies who seek federal grants must keep their transit assets in a state of good repair. Therefore, transit agencies need an intelligent predictive model for analyzing their transportation rolling stocks, finding out the current condition, and predicting when they need to be replaced or rehabilitated. Since many transit agencies do not have good analytical tools for predicting the service life of vehicles, this simple predictive model would be a valuable resource for their state of good repair needs and their prioritization of capital needs for replacement and rehabilitation. The ability to accurately predict the service life of revenue vehicles is crucial achieving the state of good repair. In this dissertation, three unique tree-based ensemble learning methods have been applied to build three predictive models. The machine learning methods used in this dissertation are random forest regression, gradient boosting regression, and decision tree regression. After evaluation and comparison of the performance results amongst all models, the gradient boosting regression model with the top 30 most important features was found to be the best fit for predicting the service life of transit vehicles. This model can be used to predict the projected retired year for all nationwide vehicles in operation, the single transit agency’s transit vehicle, and any single vehicle. The revenue vehicle inventory data from National Transit Database (NTD) has been used to build the machine learning predictive model. Before feeding the data into the model, a variety of new features were created, missing data were fixed, and extreme values or outliers were handled for the machine learning algorithm.