Biogeme is a open source Python package designed for the maximum likelihood estimation of parametric models in general, with a special emphasis on discrete choice models. It relies on the package Python Data Analysis Library called Pandas.
Biogeme used to be a stand alone software package, written in C++. All the material related to the previous versions of Biogeme are available on the old webpage.
BIOGEME is distributed free of charge. We ask each user
I would like to thank the following persons who played various roles in the development of Biogeme along the years. The list is certainly not complete, and I apologize for those who are omitted: Alexandre Alahi, Nicolas Antille, Gianluca Antonini, Cristian Arteaga, Kay Axhausen, John Bates, Denis Bolduc, David Bunch, Andrew Daly, Anna Fernandez Antolin, Mamy Fetiarison, Mogens Fosgerau, Emma Frejinger, Carmine Gioia, Marie-Hélène Godbout, Stephane Hess, Tim Hillel, Richard Hurni, Eva Kazagli, Jasper Knockaert, Xinjun Lai, Gael Lederrey, Virginie Lurkin, Nicholas Molyneaux, Nicola Ortelli, Carolina Osorio, Meritxell Pacheco Paneque, Thomas Robin, Pascal Scheiben, Matteo Sorci, Ewout ter Hoeven, Michael Thémans, Joan Walker.
I would like to give special thanks to Moshe Ben-Akiva and Daniel McFadden for their friendship, and for the immense influence that they had and still have on my work.
Biogeme is an open source Python package, that relies on the version 3 of Python. Make sure that Python 3.x is installed on your computer. If you have never used Python before, you may want to consider a complete platform such as Anaconda.
If Python is already installed on your computer, verify the version. Two versions of Python are distributed: version 2 and version 3. Biogeme works only with version 3.
A significant part of Biogeme is coded in C++ for the sake of computational efficiciency. Since version 3.2.11, this part of the code has been isolated in a separate package called cythonbiogeme. Binaries for Mac OSX and Windowns are available for versions of Python ranging from 3.7 to 3.11. If, for some reasons, the binary distribution for your system is not available, pip will attempt to compile the package from sources. In that case, it requires a proper environment to compile C++ code. In general, it is readily available on Linux, and MacOSX (if Xcode has been installed). It may be more complicated on Windows.
The command to install CythonBiogeme from source is
pip install -ve .
that must be executed in the directory containing the files setup.cfg and setup.py.
Note that it requires a proper environment to compile C++ code. In general, it is readily available on Linux, and MacOSX (if Xcode has been installed).
On Windows, here is one possibility.
extra_compile_args = -std=c++11 -DMS_WIN64 extra_link_args = -static -std=c++11 -static-libstdc++ -static-libgcc -Bstatic -lpthread -mms-bitfields -mwindows -Wl,-Bstatic,--whole-archive -Wl,--no-whole-archive
gendef python3x.dll dlltool -D python3x.dll -d python3x.def -l libpython3x.a gendef vcruntime140.dll dlltool -D vcruntime140.dll -d vcruntime140.def -l libvcruntime140.a
pip install -ve .
The command to install Biogeme from source is
pip install -ve .
that must be executed in the directory containing the files setup.cfg and setup.py.
Note that it does not require to compile C++ code (thanks to CythonBiogeme) and should be working in any environment where Python and CythonBiogeme are properly installed.
To verify if biogeme is correctly installed, you can print the version of Biogeme. To do so, execute the following commands in Python:
import biogeme.version as ver
Python 3.10.4 (main, Mar 31 2022, 03:38:35) [Clang 12.0.0 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import biogeme.version as ver >>> print(ver.getText()) biogeme 3.2.11 [2023-04-19] Home page: http://biogeme.epfl.ch Submit questions to https://groups.google.com/d/forum/biogeme Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)
If you need help, submit your questions to the users' group:
The forum is moderated. Please keep the following in mind before posting a question:
Note: versions 3.2.9 and 3.2.10 are identical. Therefore, version 3.2.9 has been removed from the official distribution platform.
DefineVariable actually defines a new column in the
database. The old syntax was:
myvar = DefineVariable('myvar', x * y + 2,
The new syntax is:
myvar = database.DefineVariable('myvar', x * y +
recycle=True. See the online documentation [here].
displayUsedVariablesin the BIOGEME constructor have been removed.
NamedTupleto make the code more readable. Refer to the examples, such as
Note that version 3.2.7 and 3.2.8 are almost identical. The description belows compares to version 3.2.6.
algorithms.pycontains generic optimization algorithms. The module
optimization.pycontains the functions that can be called directly by Biogeme [Click here for the documentation of the
estimatefunction]. [Click here for an example.]
.iter. If the file exists, Biogeme will initialize the parameters from this files, and ignore the starting values provided. To turn this feature off, set
splitfunction of the database object.
estimatefunction, and the
optimizationmodule. See also an example.
userNotesparameter of the
biogemeobject. See documentation. See example.
suggestScalesparameter of the
biogemeobject. See documentation.
quickEstimateperforms the estimation of the parameters, and skips the calculation of the statistics. See documentation.
databasemodule allows to split the database in order to prepare an estimation and a validation sets, for out-of-sample validation. See documentation. It is used by the new function
biogememodule. See documentation. See example.
In order to comply better with good programming practice in
Python, the syntax to import the variable names from the data
file has been modified since version 3.2.5. The file
headers.py is not generated anymore.
The best practice is to declare every variable explicity:
PURPOSE = Variable('PURPOSE') CHOICE = Variable('CHOICE') GA = Variable('GA') TRAIN_CO = Variable('TRAIN_CO') CAR_AV = Variable('CAR_AV') SP = Variable('SP') TRAIN_AV = Variable('TRAIN_AV') TRAIN_TT = Variable('TRAIN_TT')
If, for any reason, this explicit declaration is not desired, it is possible to replace the statement
from headers import *
database is the object containing the
database, created as follows:
import biogeme.database as db
df = pd.read_csv('swissmetro.dat', '\t')
database = db.Database('swissmetro', df)
Also, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:
from biogeme.expressions import Beta, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log
Note that it is also possible to import all of them using the following syntax
from biogeme.expressions import *
although this is not a good Python programming practice.
Yes. It is actually the default behavior. At each
iteration, Biogeme creates a
__myModel.iter. This file will be read the
next time Biogeme tries to estimate the same model. If you want to turn this
feature off, set the BIOGEME class
If the model returns a probability 0 for the chosen
alternative for at least one observation in the sample, then
the likelihood is 0, and the log likelihood is minus
infinity. For the sake of robustness, Biogeme assigns the
-1.797693e+308 to the log likelihood in
A possible reason is when the initial value of a scale parameter is too close to zero.
But there are many other possible reasons. The best way to investigate the source of the problem is to use Biogeme in simulation mode, and report the probability of the chosen alternative for each observation. Once you have identified the problematic entries, it is easier to investigate the reason why the model returns a probability of zero.
C:\Users\[USER_NAME]\anaconda3\DLLs or C:\ProgramData\Anaconda3\DLLs.
ImportError: dlopen(/Users/~/anaconda3/lib/python3.6/site-packages/biogeme/cbiogeme.cpython-36m-darwin.so, 2): Symbol not found: __ZNSt15__exception_ptr13exception_ptrD1Ev
It is likely to be due to a conflict of versions of Python packages. The best way to deal with it is to reinstall Biogeme using the following steps:
pip install --upgrade pip
pip uninstall biogeme
pip install —-upgrade cython
pip install biogeme -—no-cache-dir
conda install gccIf it does not work, try creating a new conda environment:
conda create -n python310 python=3.10 pip conda activate python310 pip install biogemeIf it does not work... I don't know :-(
Running setup.py install for biogeme ... error Complete output from command c:\users\willi\anaconda3\python.exe -u -c "import setuptools, tokenize; __file__='C:\Users\willi\AppData\Local\Temp\pip-install-iaflhasr\biogeme\setup.py'; f=getattr(tokenize, 'open', open)(__file__); code=f.read().replace('\r\n', '\n'); f.close(); exec(compile(code, __file__, 'exec'))" install --record C:\Users\willi\AppData\Local\Temp\pip-record-v6_zn0ff\install-record.txt --single-version-externally-managed --compile: Using Cython Please put "# distutils: language=c++" in your .pyx or .pxd file(s) running installIt means that there is no binaries available for your version of Python. To check which versions are supported, go to the repository
For instance, the following files are available for version 3.2.10:
biogeme-3.2.10-cp36-cp36m-macosx_10_9_x86_64.whlIt means that you can use Python 3.7, 3.8 and 3.9 on both platforms, while the version for Python 3.6 is only available on MacOSX.
This video has been recorded for earlier versions of Biogeme. Some aspects may not apply to the current version. A new video is under preparation.
The following technical reports will walk through concrete examples to get familiar with the software.
EPFL proposes a 5-day short course entitled "Discrete Choice Analysis: Predicting Individual Behavior and Market Demand". It is organized every year in March (occasionally in February).
|Lecturers:||Prof. Moshe Ben-Akiva||Massachusetts Institute of Technology, Cambridge, Ma (USA)|
|Prof. Daniel McFadden||University of Southern California [Nobel Prize Laureate, 2000]|
|Prof. Michel Bierlaire||Ecole Polytechnique Fédérale de Lausanne, Switzerland|
The University of Sydney Business School offers a course taught by Prof. David Hensher, Prof. Michiel Bliemer, Prof. John Rose and Dr. Andrew Collins.
The releases of PandasBiogeme are available on the Python Package Index repository.
Around 1990, Michel Bierlaire wrote a software package called HieLoW: Hierarchical Logit for Windows. It was written in Borland C++, and was the first discrete choice estimation software with a graphical user interface. It was designed for the estimation of logit and nested logit models. The user had to specify the models through a graphical user interface. This software was distributed by Stratec SA, Brussels.
Around 2000, the first version of Biogeme was released. Written in GNU C++, it was the first open source discrete choice software. It was designed to estimate the parameters of a list of predetermined discrete choice models such as logit, binary probit, nested logit, cross-nested logit, multivariate extreme value models, discrete and continuous mixtures of multivariate extreme value models, models with nonlinear utility functions, models designed for panel data, and heteroscedastic models. The modeling language was designed to be simple, and was developed using a a general-purpose parser generator called GNU Bison. Later, it will be referred to as BisonBiogeme. The distributions can be found here.
Around 2010, a more flexible version was designed for general purpose parametric models. The modeling language was extended, and based on the Python language. A series of discrete choice models were precoded for an easy use. Also written in GNU C++, the distributions can be found here.
In 2018, a completely new version of the software was released. It was not anymore a standalone executable, but a Python package. The package is written in Python, with the exception of the core calculations of the models, that are written in C++ for the sake of efficiency. The motivation was to combine the simplicity of the usage (especially for teaching purposes), with the sophistication provided by Python (for research and applications purposes). Morever, the management of the data relies on the package Pandas, which has become the workhorse of data scientists. Therefore, the name PandasBiogeme has been adopted. It is distributed on the Python Package Index repository.