Biogeme: utilities

mod2py: bison to python [included in the distribution]

This utility reads a model description file written for the Bison version of Biogeme (.mod) and transforms it into a model description file for the Python version of Biogem (.py).

Syntax: mod2py model transforms the file mymodel.mod into mymodel.py.

Example, .mod file:

[Choice]
CHOICE   

[Beta]
// Name Value  LowerBound UpperBound  status (0=variable, 1=fixed)
ASC_CAR 	0 -10              10              0
ASC_TRAIN  	0 -10              10              0
ASC_SM	        0 -10              10              1
B_TIME		0 -10              10              0
B_COST		0 -10              10              0

[Utilities]
// Id Name     Avail       linear-in-parameter expression (beta1*x1 + beta2*x2 + ... )
    1 A1_TRAIN TRAIN_AV_SP ASC_TRAIN * one 
                            + B_TIME * TRAIN_TT_SCALED 
                            + B_COST * TRAIN_COST_SCALED
    2 A2_SM    SM_AV          ASC_SM * one
                            + B_TIME * SM_TT_SCALED
                            + B_COST * SM_COST_SCALED
    3 A3_Car   CAR_AV_SP     ASC_CAR * one 
                            + B_TIME * CAR_TT_SCALED
                            + B_COST * CAR_CO_SCALED

[Expressions] 
one = 1
CAR_AV_SP =  CAR_AV   * (  SP   !=  0  )
TRAIN_AV_SP =  TRAIN_AV   * (  SP   !=  0  )
SM_COST =  SM_CO   * (  GA   ==  0  ) 
TRAIN_COST =  TRAIN_CO   * (  GA   ==  0  )
TRAIN_TT_SCALED = TRAIN_TT / 100.0
TRAIN_COST_SCALED = TRAIN_COST / 100
SM_TT_SCALED = SM_TT / 100.0
SM_COST_SCALED = SM_COST / 100
CAR_TT_SCALED = CAR_TT / 100
CAR_CO_SCALED = CAR_CO / 100

[Exclude]
(( PURPOSE != 1 ) * (  PURPOSE   !=  3  ) + ( CHOICE == 0 )) 

[Model]
$MNL

Generated .py file:

# This file has automatically been generated.
# Tue Apr 21 09:36:59 2015
# Michel Bierlaire, EPFL
# biogeme 2.4 [Lun 19 jan 2015 18:40:42 CET]
# Michel Bierlaire, EPFL

#####################################################
# This file complies with the syntax of pythonbiogeme
# In general, it may require to be edited by hand before being operational
# It is meant to help users translating their models from the previous version of biogeme to the python version.
#####################################################

from biogeme import *
from headers import *
from loglikelihood import *
from statistics import *
  
# [Choice]
__chosenAlternative = CHOICE 

# [Weight]
# NONE

#[Beta]
#Parameters to be estimated
# Arguments:
#   1  Name for report. Typically, the same as the variable
#   2  Starting value
#   3  Lower bound
#   4  Upper bound
#   5  0: estimate the parameter, 1: keep it fixed
ASC_CAR	 = Beta('ASC_CAR',0,-10,10,0)
ASC_SM	 = Beta('ASC_SM',0,-10,10,1)
ASC_TRAIN	 = Beta('ASC_TRAIN',0,-10,10,0)
B_COST	 = Beta('B_COST',0,-10,10,0)
B_TIME	 = Beta('B_TIME',0,-10,10,0)

# [Expressions] 
# Define here arithmetic expressions for name that are not directly 
# available from the data

one  = DefineVariable('one',1)
CAR_AV_SP  = DefineVariable('CAR_AV_SP', CAR_AV   * (  SP   !=  0  ))
TRAIN_AV_SP  = DefineVariable('TRAIN_AV_SP', TRAIN_AV   * (  SP   !=  0  ))
SM_COST  = DefineVariable('SM_COST', SM_CO   * (  GA   ==  0  ))
TRAIN_COST  = DefineVariable('TRAIN_COST', TRAIN_CO   * (  GA   ==  0  ))
TRAIN_TT_SCALED  = DefineVariable('TRAIN_TT_SCALED', TRAIN_TT   /  100 )
TRAIN_COST_SCALED  = DefineVariable('TRAIN_COST_SCALED', TRAIN_COST   /  100 )
SM_TT_SCALED  = DefineVariable('SM_TT_SCALED', SM_TT   /  100 )
SM_COST_SCALED  = DefineVariable('SM_COST_SCALED', SM_COST   /  100 )
CAR_TT_SCALED  = DefineVariable('CAR_TT_SCALED', CAR_TT   /  100 )
CAR_CO_SCALED  = DefineVariable('CAR_CO_SCALED', CAR_CO   /  100 )

#[Group]

#[Utilities]
__A1_TRAIN = ASC_TRAIN * one + B_TIME * TRAIN_TT_SCALED + B_COST * TRAIN_COST_SCALED
__A2_SM = ASC_SM * one + B_TIME * SM_TT_SCALED + B_COST * SM_COST_SCALED
__A3_Car = ASC_CAR * one + B_TIME * CAR_TT_SCALED + B_COST * CAR_CO_SCALED
__V = {1: __A1_TRAIN,2: __A2_SM,3: __A3_Car}
__av = {1: TRAIN_AV_SP,2: SM_AV,3: CAR_AV_SP}

#[Draws]
BIOGEME_OBJECT.PARAMETERS['NbrOfDraws'] = "150"
#[Exclude]
BIOGEME_OBJECT.EXCLUDE = ( (  PURPOSE   !=  1  ) * (  PURPOSE   !=  3  ) ) + (  CHOICE   ==  0  )

#[Model]
# MNL  // Logit Model
# The choice model is a logit, with availability conditions
prob = bioLogit(__V,__av,__chosenAlternative)
__l = log(prob)

# Defines an itertor on the data
rowIterator('obsIter') 

# Define the likelihood function for the estimation
BIOGEME_OBJECT.ESTIMATE = Sum(__l,'obsIter')

Likelihood ratio test [Download here]

It is an Excel file where the user can apply the likelihood ratio test. The final log likelihood and the number of estimated parameters for both the restricted and the unrestricted models must be provided.

CNL correlation [Download here]

It is a Matlab code to compute the correlation structure of a cross-nested logit model.

Variance computation [Download here]

It is an Excel sheet that allows to compute the variance of the difference and the ratio of two estimators. A typical application in the context of discrete choice is the computation of the standard error for parameters like value-of-time. The approximation computed here come from the Taylor series, where terms higher than second order are ignored. Source: MVA et al. (1998) "Value of Trave Time Savings".

Checking data files with biocheckdata [included in the distribution]

The syntax is:
biocheckdata mydata.dat
The script checks if a datafile is complying with the requirements of biogeme. In particular, it checks if the number of elements in each row matches the number of headers in the first row. It also detects if some entries are not numeric.

The script should be available after the installation procedure is complete. By default, it is installed as /usr/local/bin/biocheckdata.

Prepare the data file with biopreparedata [included in the distribution]

The syntax is:
biopreparedata mydata.csv

The script prepares a CSV data file in a format requested by biogeme. Each column containing strings is coded with numbers. The following conventions are adopted:

Strings are delimited with double quotes.
```
"
```
Each blank in the name of a header is replaced by a underscore.
```
_
```
Entries are separated by a comma.
```
,
```
If an entry of the first row is numeric, the corresponding column is supposed to contain only numerical values. If a non-numerical value is detected in another row, is it replaced by 99999.

You may edit the file /usr/local/share/biogeme/biopreparedata.py in order to change these conventions.

If an entry in the first column is a string, the script associates each string in the corresponding column with a numeric value.

Two files are generated:

biogeme_mydata.csv is the data file complying with the biogeme requirements.
legend_mydata.csv describes the codings that have been used for non numeric data.

Suppose that the mydata.csv file contains

Id,   The name, The rank
1,  "Me",  2
2, "You", 3
3, "Him", 3

Then, the generated file biogeme_mydata.csv will contain

Id      ___The_name     _The_rank
1       2        2
2       0        3
3       1        3

and the generated file legend_mydata.csv will contain

+++++++++++++++++++++++++
Legend for column  ___The_name
+++++++++++++++++++++++++
0 :       "You"
1 :       "Him"
2 :        "Me"

Histograms [included in the distribution]

Tool to organize a list of raw numbers into categories to plot an histogram using bins of a given size. The raw data should contain only a list of numbers. The syntax is histogram filename binSize.

For example, if the Raw data file is

0.2748861875
1.2194215178
-0.1088626369
1.887765541
0.143842688
0.5121648584
0.9323810467
0.1969901739
-0.2501622963
0.6349579371
-0.6544964817
0.1684235135
1.0532380188
0.9794024028
0.8001565071
-0.7680558349
-2.5749162274
0.8619355768
-0.0267481139
-0.9200574846

the command is histogram test.dat 1.0, and the following files are generated: _hist_test.dat

Value Frequency
-3.0 1
-1.0 6
0.0 10
1.0 3

meaning that there is 1 value between -3 and -1, 6 values between -1 and 0, 10 values between 0 and 1, and 3 values larger than 1. A Gnuplot file _hist.gp is also generated

set style data histogram
set style histogram cluster gap 0
set style fill solid 1.0
plot '_hist_test.dat' using 1:2 ti col smooth frequency with boxes

A weighted version of the tool exists. Instead of couting the number of value in each bin, it adds the weights of these values. The raw data file must contain two values on each row: the value, and its weight. The syntax is weightedhistogram filename binSize.

For example, if the Raw data file is

0.2748861875        0.9
1.2194215178        0.2
-0.1088626369       1.0
1.887765541         1.0
0.143842688         0.6
0.5121648584        0.1
0.9323810467        0.1
0.1969901739        0.4
-0.2501622963       0.3
0.6349579371        0.2
-0.6544964817       0.4
0.1684235135        0.1
1.0532380188        0.8
0.9794024028        0.9
0.8001565071        0.3
-0.7680558349       0.8
-2.5749162274       0.4
0.8619355768        0.1
-0.0267481139       0.3
-0.9200574846       1.0

the command is weightedhistogram test.dat 1.0, and the following files are generated: _hist_test.dat

Value Frequency
-3.0 0.4
-1.0 3.8
0.0 3.7
1.0 2.0

meaning that the total weight of values between -3 and -1 is 0.4, the total weight for values between -1 and 0 is 3.8, the total weighr for values between 0 and 1 is 3.7, and the total weight for values larger than 1 is 2. A Gnuplot file _hist.gp is also generated

set style data histogram
set style histogram cluster gap 0
set style fill solid 1.0
plot '_hist_test.dat' using 2:xtic(1) t '_hist_test.dat'

Sven Mueller's utilities [Access here]

xml-file for code highlighting of Biogeme .mod-files in notepad, and MS Excel sheet for Horowitz-Test (non-nested hypothesis).

Biogeme utilities