Hillel, T. (2020)
New perspectives on the performance of machine learning classifiers for mode-choice prediction
It appears to be a commonly held belief that Machine Learning (ML) classification algorithms should achieve substantially higher predictive performance than manually specified Random Utility Models (RUMs) for choice modelling. This belief is supported by several papers in the mode choice literature, which highlight stand-out performance of non-linear ML classifiers compared with linear models. However, many studies which compare ML classifiers with linear models have a fundamental flaw in how they validate models on out-of-sample data. This paper investigates the implications of this issue by comparing out-of-sample validation using two different sampling methods for panel data:(i) trip-wise sampling, where validation folds are sampled independently from all trips in the dataset and (ii) grouped sampling, where validation folds are sampled grouped by household/person. This paper includes two linked investigations: (i) a dataset investigation which quantifies the proportion of matching trips across training and validation data when using trip-wise sampling for Out-Of-Sample (OOS) validation and (ii) a modelling investigation which compares OOS validation results obtained using trip-wise sampling and grouped sampling. These investigations make use of the data and methodologies of three published studies which explore ML classification of mode choice. The results of the dataset investigation indicate that using trip-wise sampling with travel diary data results in significant data leakage, with up to 96% of the trips in typical trip-wise sampling validation folds having matching trips with the same mode choice in the training data. Furthermore, the modelling investigation demonstrates that this data leakage introduces substantial bias in model performance estimates, particularly for flexible non-linear classifiers. Grouped sampling is found to address the issues associated with trip-wise sampling and provides reliable estimates of true OOS predictive performance. The use of trip-wise sampling with panel data has led to incorrect conclusions being made in two of the investigated studies, with the original results substantially overstating the performance of ML models compared with linear Logistic Regression (LR) models. Whilst the results from this study indicate that there is a slight predictive performance advantage of non-linear classifiers (in particular Ensemble Learning (EL) models) over linear LR models, this advantage is much more modest than has been suggested by previous investigations.
Download PDF