Mining Connected Vehicle Data for Beneficial Patterns in Dubai Taxi Operations
Author/Creator
Bridgelall, Raj
Lu, Pan
Tolliver, Denver D.
Xu, Tie
More Information
Show full item recordView/ Open
Abstract
On-demand shared mobility services such as Uber and micro-transit are steadily penetrating the
worldwide market for traditional dispatched taxi services. Hence, taxi companies are seeking
ways to compete. This study mined large-scale mobility data from connected taxis to discover
beneficial patterns that may inform strategies to improve dispatch taxi business. It is not practical
to manually clean and filter large-scale mobility data that contains GPS information. Therefore,
this research contributes and demonstrates an automated method of data cleaning and filtering
that is suitable for such types of datasets. The cleaning method defines three filter variables and
applies a layered statistical filtering technique to eliminate outlier records that do not contribute
to distributions that match expected theoretical distributions of the variables. Chi-squared
statistical tests evaluate the quality of the cleaned data by comparing the distribution of the three
variables with their expected distributions. The overall cleaning method removed approximately
5% of the data, which consisted of errors that were obvious and others that were poor quality
outliers. Subsequently, mining the cleaned data revealed that trip production in Dubai peaks for
the case when only the same two drivers operate the same taxi. This finding would not have been
possible without access to proprietary data that contains unique identifiers for both drivers and
taxis. Datasets that identify individual drivers are not publicly available.