Ortelli, N., Hillel, T., Pereira, F. C., de Lapparent, M., and Bierlaire, M. (2020)

Variable Neighborhood Search for Assisted Utility Specification in Discrete Choice Models

9th Symposium of the European Association for Research in Transportation, Lyon, France

In the last 40 years, transportation demand modeling has almost exclusively been tackled using discrete choice models (DCMs). This is due to their high interpretability, which allows to verify their compliance with well-established behavioral theory. However, the development of DCMs through manual specification is laborious. The predominant approach for this task is to a priori include a certain number of variables that are regarded as essential in the specification of the model; incremental changes are then tested in order to improve its goodness of fit, while ensuring its behavioral realism (Koppelman and Bhat, 2006). Because the set of candidate specifications grows beyond manageable even with a moderate number of variables under consideration, this kind of theory-driven approaches appears to be time-consuming and prone to errors. Modelers tend to rely on common sense or intuition without further validation of the supposedly prevailing constructs they prioritize, while the implications of working with incorrectly specified models and possibly biased parameters are largely underestimated (Torres et al., 2011, Van Der Pol et al., 2014). This issue, worsened by the advent of big data and the need to analyze ever-larger datasets, has driven an increasing focus on machine learning (ML) as a way of relieving the modeler of the burden of model specification. In the past years, numerous studies have investigated the usefulness of ML classifiers as an alternative to DCMs by comparing logit models with methods such as decision trees (Tang et al., 2015; Lhéritier et al., 2019), support vector machines (Zhang and Xie, 2008; Paredes et al., 2017) or neural networks (Zhao et al., 2018; Lee et al., 2018). The studies indicate that the latter are outperformed in terms of prediction accuracy (Hagenauer and Helbich, 2017; Wang and Ross, 2018); however, the former suffer from a crucial limitation: they lack interpretability. The goal of DCMs is to accurately predict the choices of a population in a particular context, but the estimated values of the parameters are equally important: DCMs have strong behavioral foundations that originate in random utility theory (McFadden, 1974) and their mathematical structure allows to understand the decision processes, in addition to predicting their outcome. DCMs may be worse at prediction than their ML counterparts, but the former provide valuable insights into the underlying process that individuals follow when making choices.