Lederrey, G., Lurkin, V., Hillel, T., and Bierlaire, M. (2019)
Stochastic optimization with adaptive batch size: Discrete choice models as a case study
8th Symposium of the European Association for Research in Transportation, Budapest, Hungary
The 2.5 quintillion bytes of data created each day brings new opportunities, but also new stimulating challenges for the discrete choice community. Opportunities because more and more new and larger data sets will undoubtedly become available in the future. In addition, with these new data comes the possibility to uncover new insights into customers' behaviors. Challenging because insights can only be discovered if models can be estimated, which is not simple on these large datasets. State-of-the art algorithms, but also standard practices regarding model specification might no longer be adapted to these large data sets. Indeed, three state-of-the-art discrete choice softwares (Pandas Biogeme (Bierlaire, 2018), PyLogit (Brathwaite et al., 2017), and Larch (Newman et al., 2018)) are using the standard optimization algorithm available in the Python package scipy, i.e. the BFGS algorithm. In contrast, extracting useful information from big data sets is at the core of Machine Learning (ML). Primarily interested in achieving high prediction accuracy, ML algorithms (and especially Neural Networks) have proved to be successful on models involving a huge number of parameters. Thus, large-scale machine learning models often involve both large volumes of parameters and large datasets. As such, first-order stochastic methods are a natural choice for large-scale machine learning optimization. Due to the high cost of computing the full-Hessian, the second-order methods have been much less explored. And yet, algorithms exploiting second-order information can provide faster convergence. For the sake of interpretability, discrete choice models usually have a more restricted set of parameters than models typically investigated in the ML community. We, therefore, argue that it is possible to use second-order information to estimate these models. In this paper, inspired by the good practices and the intensive use of stochastic gradient methods in the ML field, we introduce the algorithm called Window Moving Average - Adaptive Batch Size (WMA-ABS) which is used to improve the efficiency of stochastic second-order methods. The objective of this paper is to investigate the convergence of our algorithms by benchmarking it against against standard second-order methods and quasi-newton methods using simple logit models on different datasets. We present preliminary results that indicate that our algorithms outperform the standard second-order methods, especially for large datasets. It constitutes a first step to show that stochastic algorithms can finally find their place in the optimization of Discrete Choice Models (DCMs).