Hillel, T., Bierlaire, M., Elshafie, M., and Ying, J. (2018)

Validation of probabilistic classifiers

18th Swiss Transport Research Conference, Ascona, Switzerland

Non-parametric probabilistic classification models are increasingly being investigated as an alternative to Discrete Choice Models (DCMs), e.g. for predicting mode choice. There exist many strategies within the literature for model selection between DCMs, either through the testing of a null hypothesis, e.g. likelihood ratio, Wald, Lagrange Multiplier tests, or through the comparison of information criteria, e.g. Bayesian and Aikaike information criteria. However, these tests are only valid for parametric models, and cannot be applied to non-parametric classifiers. Typically, the performance of Machine Learning classifiers is validated by computing a performance metric on out-of-sample test data, either through cross validation or hold-out testing. Whilst bootstrapping can be used to investigate whether differences between test scores are stable under resampling, there are few studies within the literature investigating whether these differences are significant for non-parametric models. To address this, in this paper we introduce three statistical tests which can be applied to both parametric and non-parametric probabilistic classification models. The first test considers the analytical distribution of the expected likelihood of a model given the true model. The second test uses similar analysis to determine the distribution of the Kullback-Leibler divergence between two models. The final test considers the convex combination of two classifiers under comparison. These tests allow ML classifiers to be compared directly, including with DCMs.