BISON BIOGEME Walkthrough

In order to introduce the syntax of Biogeme, we are explaining in details an example where a logit model with 3 alternatives is estimated. The following files are necessary to run the example:

The model

The model is a logit model with 3 alternatives. The utility functions are defined as:
V_1 = V_TRAIN =  ASC_TRAIN + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
V_2 = V_SM = ASC_SM + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED
V_3 = V_CAR =  ASC_CAR + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED
where TRAIN_TT_SCALED, TRAIN_COST_SCALED, SM_TT_SCALED, SM_COST_SCALED, CAR_TT_SCALED, CAR_CO_SCALED are variables, and ASC_TRAIN, ASC_SM, ASC_CAR, B_TIME, B_COST are parameters to be estimated. Note that it is not possible to identify all alternative specific constants ASC_TRAIN, ASC_SM, ASC_CAR from data. Consequently, ASC_SM is normalized to 0. The availability of an alternative i is determined by the variable avi, i=1,...3, which is equal to 1 if the alternative is available, 0 otherwise. The probability of choosing an available alternative i is given by the logit model:

P(i) = exp(Vi) / (av exp(V1)+av2 exp(V2)+av3 exp(V3)).

Given a data set of N observations, the loglikelihood of the sample is

L = Σnlog P(in)

where in is the alternative actually chosen by individual n.

The data file

Biogeme assumes that the data file contains in its first line a list of labels corresponding to the available data, and that each subsequent line contains the exact same number of numerical data, each row corresponding to an observation. Delimiters can be tabs or spaces. The data file used for this example can be downloaded here.

The model specification file

We explain here line by line the model specification file. It is organized into sections. In principle, the order in which the sections appear is irrelevant.

[ModelDescription]

This section allows to mention a description of the model that will be copied in the report file. Each line of the description must be delimited by double quotes.
[ModelDescription]
"Example of a logit model for a transportation mode choice with 3 alternatives:"
"- Train"
"- Car"
"- Swissmetro, an hypothetical high-speed train"

[Choice]

It simply describes to Biogeme where the dependent variable (that is, the chosen alternative) can be found in the file.
[Choice]
CHOICE   
Note that the syntax is case sensitive, and that CHOICE is different from choice, and from Choice.

[Beta]

Each parameter to be estimated must be declared in this section. For each parameter, the following must be mentioned:
  1. the name of the parameter
  2. the default value
  3. a lower bound
  4. an upper bound
  5. a flag that indicates if the parameter must be estimated (0) or if it keeps its default value (1).
[Beta]
// Name Value  LowerBound UpperBound  status (0=variable, 1=fixed)
ASC_CAR 	0 -10              10              0
ASC_TRAIN  	0 -10              10              0
ASC_SM	        0 -10              10              1
B_TIME		0 -10              10              0
B_COST		0 -10              10              0
Note that the fifth entry for ASC_SM is 1, as we want to maintain it to its default value, that is 0.0.

[LaTeX]

Among other output files, Biogeme generates a file in LaTeX format. In this section, the name of the parameters can be specified in LaTeX syntax, to appear properly in the output file.
[LaTeX]
ASC_CAR "Cte. car"
ASC_SBB "Cte. train"
ASC_SM	"Cte. Swissmetro"
B_TIME	"$\beta_\text{time}$"
B_COST	"$\beta_\text{cost}$"

[Utilities]

The specification of the utility functions is described in this section. The specification for one alternative must start at a new row, and may actually span several rows. For each of them, four entries are specified:
  1. The identifier of the alternative, with a numbering convention consistent with the section [Choice].
  2. The name of the alternative.
  3. The availability condition. In this case, it is a direct reference to one of the entries in the data file. The convention is that zero is treated as "false", and one is treated as "true". Actually, any value different from zero is considered as "true".
  4. The linear-in-parameter utility function is composed of a list of terms, separated by a +. Each term is composed of the name of a parameter and the name of an attribute, separated by a *. Note that a space is required after each parameter name.
[Utilities]
// Id Name  Avail  linear-in-parameter expression
    1 A1_TRAIN TRAIN_AV_SP ASC_TRAIN * one 
                            + B_TIME * TRAIN_TT_SCALED 
                            + B_COST * TRAIN_COST_SCALED
    2 A2_SM    SM_AV          ASC_SM * one
                            + B_TIME * SM_TT_SCALED
                            + B_COST * SM_COST_SCALED
    3 A3_Car   CAR_AV_SP     ASC_CAR * one 
                            + B_TIME * CAR_TT_SCALED
                            + B_COST * CAR_CO_SCALED

[Expressions]

It describes to Biogeme how to compute attributes not directly available from the data file.
  • When boolean variables are involved, the value TRUE is represented by 1, and the value FALSE is represented by 0. Therefore, a multiplication involving a boolean variable is equivalent to a "AND" operator.
    CAR_AV_SP =  CAR_AV   * (  SP   !=  0  )
    TRAIN_AV_SP =  TRAIN_AV   * (  SP   !=  0  )
    SM_COST =  SM_CO   * (  GA   ==  0  ) 
    TRAIN_COST =  TRAIN_CO   * (  GA   ==  0  )
    
  • Variables can be rescaled
    TRAIN_TT_SCALED = TRAIN_TT / 100.0
    TRAIN_COST_SCALED = TRAIN_COST / 100
    SM_TT_SCALED = SM_TT / 100.0
    SM_COST_SCALED = SM_COST / 100
    CAR_TT_SCALED = CAR_TT / 100
    CAR_CO_SCALED = CAR_CO / 100
    

[Exclude]

It contains a boolean expression that is evaluated for each observation of the data file. Each observation such that this expression is "true" is discarded from the sample. Here, the modeler has developed the model only for work trips. Observations such that the dependent variable CHOICE is 0 are also removed.
(( PURPOSE != 1 ) * (  PURPOSE   !=  3  ) + ( CHOICE == 0 )) 

[Model]

It tells Biogeme which assumptions must be used regarding the error term, that is which type of model must be estimated. In this example, it is the logit model (or MNL, for multinomial logit, as it is sometimes called).
[Model]
// $MNL stands for MultiNomial Logit 
$MNL

Running biogeme

If Biogeme has been installed properly, the estimation is started with the following statement:
biogeme 01logit swissmetro.dat
The following appears on the screen:

Information about the version of Biogeme. The date is when the software was compiled.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
biogeme 2.2 [Mar 3 jan 2012 18:26:41 CET]
Michel Bierlaire, EPFL
-- Compiled by michelbierlaire on Darwin
See http://biogeme.epfl.ch
                    !! CFSQP is available !!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	"In every non-trivial program there is at least one bug."

Biogeme checks if a file called mymodel.par, containing various parameters, exists. If not, it checks if the file called default.par exists. If not, it creates it and set default values to the parameters. That's what most users need in the beginning. Note that the information like [15:48:50]patFileNames.cc:49 can be safely ignored.

[14:57:01]patFileNames.cc:49  01logit.par does not exist
[14:57:01]patFileNames.cc:53  Trying default.par instead
[14:57:01]patBiogeme.cc:178  File default.par does not exist. Default values will be used
[14:57:01]patBiogeme.cc:180  A file default.par has been created

Biogeme then reads the model and data files and reports various information.


 Opening file swissmetro.dat
 Data  file... line 500	Memory: 97 Kb
 Data  file... line 1000	Memory: 184 Kb
 Data  file... line 1500	Memory: 184 Kb
 Data  file... line 2000	Memory: 191 Kb
 Data  file... line 2500	Memory: 289 Kb
 Data  file... line 3000	Memory: 386 Kb
 Data  file... line 3500	Memory: 484 Kb
 Data  file... line 4000	Memory: 503 Kb
 Data  file... line 4500	Memory: 600 Kb
 Data  file... line 5000	Memory: 647 Kb
 Data  file... line 5500	Memory: 745 Kb
 Data  file... line 6000	Memory: 842 Kb
 Data  file... line 6500	Memory: 940 Kb
 Data  file... line 7000	Memory: 1 Mb
 Data  file... line 7500	Memory: 1 Mb
 Data  file... line 8000	Memory: 1 Mb
 Data  file... line 8500	Memory: 1 Mb
 Data  file... line 9000	Memory: 1 Mb
 Data  file... line 9500	Memory: 1 Mb
 Data  file... line 10000	Memory: 1 Mb
 Data  file... line 10500	Memory: 1 Mb
 Total obs.:   10727
 Total memory: 1321.88 Kb
 Run time for data processing: 00:01

Biogeme then starts the estimation. It displays miscellaneous information at each iteration of the estimation algorithm.

  Init loglike=-6964.66
     gmax Iter   radius        f(x)     Status       rhok nFree
 +1.44e-03    1 1.00e+00 +6.9646630e+03 ****Converg  +1.05e+00 4  ++
 +1.82e-03    2 2.00e+00 +5.5911931e+03 ****Converg  +9.74e-01 4  ++
 +1.93e-03    3 4.00e+00 +5.3677413e+03 ****Converg  +1.12e+00 4  ++
 +2.04e-03    4 8.00e+00 +5.3424604e+03 ****Converg  +1.52e+00 4  ++
 +1.92e-03    5 1.60e+01 +5.3362826e+03 ****Converg  +1.66e+00 4  ++
 +1.92e-03    6 3.20e+01 +5.3336219e+03 ****Converg  +1.69e+00 4  ++
 +1.97e-03    7 6.40e+01 +5.3324003e+03 ****Converg  +1.69e+00 4  ++
 +2.01e-03    8 1.28e+02 +5.3318102e+03 ****Converg  +1.70e+00 4  ++
 +2.03e-03    9 2.56e+02 +5.3315246e+03 ****Converg  +1.70e+00 4  ++
 +2.03e-03   10 5.12e+02 +5.3313855e+03 ****Converg  +1.70e+00 4  ++
 +1.43e-03   11 1.02e+03 +5.3313175e+03 ****Converg  +1.70e+00 4  ++
 +1.01e-03   12 2.05e+03 +5.3312842e+03 ****Converg  +1.70e+00 4  ++
 +7.11e-04   13 4.10e+03 +5.3312679e+03 ****Converg  +1.70e+00 4  ++
 +5.00e-04   14 8.19e+03 +5.3312598e+03 ****Converg  +1.70e+00 4  ++
 +3.52e-04   15 1.64e+04 +5.3312559e+03 ****Converg  +1.70e+00 4  ++
 +2.47e-04   16 3.28e+04 +5.3312539e+03 ****Converg  +1.70e+00 4  ++
 +1.74e-04   17 6.55e+04 +5.3312529e+03 ****Converg  +1.70e+00 4  ++
 +1.22e-04   18 1.31e+05 +5.3312525e+03 ****Converg  +1.70e+00 4  ++
 +8.58e-05   19 2.62e+05 +5.3312522e+03 ****Converg  +1.70e+00 4  ++
 +6.03e-05   20 5.24e+05 +5.3312521e+03 ****Converg  +1.70e+00 4  ++
 +4.23e-05   21 1.05e+06 +5.3312521e+03 ****Converg  +1.70e+00 4  ++
 +2.97e-05   22 2.10e+06 +5.3312520e+03 ****Converg  +1.70e+00 4  ++
 +2.09e-05   23 4.19e+06 +5.3312520e+03 ****Converg  +1.70e+00 4  ++
 +1.47e-05   24 8.39e+06 +5.3312520e+03 ****Converg  +1.70e+00 4  ++
 +1.03e-05   25 1.68e+07 +5.3312520e+03 ****Converg  +1.70e+00 4  ++
 +7.24e-06   26 3.36e+07 +5.3312520e+03 ****Converg  +1.70e+00 4  ++

 Convergence reached...
--> time interval [14:57:02,14:57:03]

Biogeme reports the running time and prepares the output files.

Run time: 00:01
 Final log-likelihood=-5331.25
 Be patient... BIOGEME is preparing the output files
--> time interval [14:57:03,14:57:03]
 Run time for var/covar computation: 00:00

For the record, Biogeme reports the list of files that were actually used as input.

 BIOGEME Input files
 ===================
 Parameters:			default.par
 Model specification:		01logit.mod
 Sample 1 :				swissmetro.dat

Biogeme reports the list of files that have been created, containing the results of the estimation, as well as many other pieces of information.

 BIOGEME Output files
 ====================
 Estimation results:		01logit.rep
 Estimation results (HTML):	01logit.html
 Estimation results (Latex):	01logit.tex
 Estimation results (ALogit):	01logit.F12
 Result model spec. file:	01logit.res
 Sample statistics:		01logit.sta

Biogeme reports also the name of files that may be helpful in understanding problems with the model.

 BIOGEME Debug files
 ===================
 Log file:			01logit.log
 Parameters debug:		parameters.out
 Model debug:			model.debug
 Model spec. file debug:		__specFile.debug

Biogeme reports some information specific to the model. For logit, it reports the minimum argument of all exponentials computed during the process, in order to signal a possible underflow. Most users do not worry about this information.

 Model informations: Multinomial Logit Model
 ==================
 The minimum argument of exp was -18.352

 Run time for estimation:      00:01
 Total run time:               00:02

For the results, most users will consult the HTML file using their preferred browser. A file written in ASCII format is also available, with the extension .rep. A file with LaTeX code is also created, so that the results can easily be integrated in a report or an article written with this word processor.

Back
Biogeme