Title:
Goodness of fit in DCMs
Responsable(s) :
Gael Lederrey, Michel Bierlaire, Nicholas Molyneaux
Description :
In these last years, with the arrival and wide-spreading of Big data, the discrete choice modeling community has gained access to larger datasets, more computing power and the possibility to drastically increase model complexity. It has therefore become crucial to establish precise measures of the goodness of fit as well as techniques to detect and reduce overfitting. We propose to apply known techniques from machine learning � such as cross-validation � to discrete choice models. We investigate the existence of overfitting in discrete choice models using a linear and a polynomial multinomial logit model on the Optima dataset. We adopted a train-test approach to evaluate model performance. Model estimates for both models were computed on N = 100, 200 and 500 runs of random train-test splits, using an 80-20 ratio. The results show that the polynomial model has better fit on the training samples, but performs worse than the linear model on the testing set, indicating the presence of overfitting. The use of K-fold cross-validation for simple multinomial logit models has also been explored. However, model estimates obtained using K-fold cross-validation did not differ from the model estimates estimates obtained from a single model fitting, due to underfitting. The distributions of the results obtained on the Optima dataset motivated the exploration of an empirical hypothesis test to determine the presence of overfitting. The test is based on the assumption that training and testing samples of a model which is not overfitting will have the same mean. This test was implemented using the t-test for two samples with unequal variances, also known as Welch test. This empirical test presented in this report did not differentiate the models in terms of overfitting. However, it could be a starting point of our future research, which will be aimed at establishing a statistical definition of overfitting.
Collaboration with:
Type :
semester project
Pré-requis :
A good knowledge of Python is required. In addition, some general knowledge in Machine Learning and Discrete Choice Models (Course Mathematical Modelling of Behaviour) are highly recommended.
Submitted on :
February 11, 2021