Farooq, B., Bierlaire, M., and Flötteröd, G.

Simulation based Population Synthesis

Speaker: Farooq Bilal

Seventh workshop on discrete choice models, EPFL

August 26, 2011

Microsimulation of transportation and land use evolution require base year, individual characteristics and disaggregate locations of the households and persons living in the study area. On the other hand at best, the census and travel survey, which are the primary sources of the data, provide only cross tabulations at various level of spatial aggregations (sector, commune, region, and country) and a small sample of the individual level information (microdata) that usually doesn't have the spatial information attached to it. This necessitates generation of the baseline population using some synthetic means. Currently, various variants of the Iterative Proportional Fitting (IPF) are predominantly used to generate the base year synthetic population. IPF essentially creates clones of the individual records of households and persons from microdata in a way that the marginal at one or more levels of spatial aggregations are satisfied. In the process of doing so, the IPF ensures that the correlation structure of the sample is preserved in the synthesized populations. The key shortcomings of IPF include: a) losing the heterogeneity that may not have been captured in the microdata, due to cloning rather than true synthesis of the population b) over reliance on the accuracy of the data to determine the cloning weights c) very poor scalability with respect to the increased demand in the number of characteristics of the population that need to be synthesized. In order to overcome these shortcomings and move the research in population synthesis for microsimulations significantly forward, we are in the process of developing a Markov Chain Monte Carlo Simulation based approach that its core uses Gibbs and Metropolis-Hasting sampling methods. This approach, instead of cloning the microdata, generates the joint distribution of the characteristics of the households, persons, and the associations between them, by using any available data on these three dimensions. The resulting joint distribution is thus the best possible representation of the real population, given all the available information. The synthetic population is then generated from the realization of the joint distribution. This way the population synthesis can also become seamless part of these microsimulations and thus could be included in the sensitivity analysis of them. In terms of the progress in implementation, we are currently developing a C++ based code and testing the methodology by generating the synthetic population for Brussels. The initial results show a good fit.