This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit biogeme.epfl.ch.
import datetime
print(datetime.datetime.now())
import biogeme.version as ver
print(ver.getText())
import numpy as np
import pandas as pd
import biogeme.expressions as ex
import biogeme.database as db
We first create a small database
df = pd.DataFrame({'Person':[1,1,1,2,2],
'Exclude':[0,0,1,0,1],
'Variable1':[10,20,30,40,50],
'Variable2':[100,200,300,400,500],
'Choice':[1,2,3,1,2],
'Av1':[0,1,1,1,1],
'Av2':[1,1,1,1,1],
'Av3':[0,1,1,1,1]})
myData = db.Database('test',df)
The following type of expression is a literal called Variable that corresponds to an entry in the database.
Person=ex.Variable('Person')
Variable1=ex.Variable('Variable1')
Variable2=ex.Variable('Variable2')
Choice=ex.Variable('Choice')
Av1=ex.Variable('Av1')
Av2=ex.Variable('Av2')
Av3=ex.Variable('Av3')
It is possible to add a new column to thre database, that creates a new variable that can be used in expressions.
newvar = ex.DefineVariable('newvar',Variable1+Variable2,myData)
print(myData)
The following type of expression is another literal, corresponding to an unknown parameter.
beta1 = ex.Beta('beta1',0,None,None,0)
beta2 = ex.Beta('beta2',0,None,None,0)
beta3 = ex.Beta('beta3',1,None,None,1)
beta4 = ex.Beta('beta4',0,None,None,1)
Arithmetic operators are overloaded to allow standard manipulations of expressions. The first expression is $$e_1 = 2 \beta_1 - \frac{\exp(-\beta_2)}{\beta_3 (\beta_2 \geq \beta_1)},$$ where $(\beta_2 \geq \beta_1)$ equals 1 if $\beta_2 \geq \beta_1$ and 0 otherwise.
expr1 = 2 * beta1 - ex.exp(-beta2) / (beta3 * (beta2 >= beta1))
print(expr1)
The evaluation of expressions can be done in two ways. For simple expressions, the fonction getValue(), implemented in Python, returns the value of the expression.
expr1.getValue()
It is possible to modify the values of the parameters
newvalues = {'beta1':1,'beta2':2,'beta3':3,'beta4':2}
expr1.changeInitValues(newvalues)
expr1.getValue()
The function getValue_c() is implemented in C++, and works for any expression. It requires a database as input, and evaluates the expression for each entry in the database. In the following example, as no variable of the database is involved in the expression, the output of the expression is the same for each entry.
expr1.getValue_c(myData)
The following function scans the expression and extracts a dict with all free parameters.
expr1.setOfBetas()
Options can be set to extract free parameters, fixed parameters, or both.
expr1.setOfBetas(free=False,fixed=True)
expr1.setOfBetas(free=True,fixed=True)
expr1.getElementaryExpression('beta2')
Let's consider an expression involving two variables $V_1$ and $V_2$: $$e_2 =2 \beta_1 V_1 - \frac{\exp(-\beta_2 V_2) }{ \beta_3 (\beta_2 \geq \beta_1)}.$$ Note that, in our example, the second term is numerically negligible with respect to the first one.
expr2 = 2 * beta1 * Variable1 - ex.exp(-beta2*Variable2) / (beta3 * (beta2 >= beta1))
print(expr2)
It is not a simple expression anymore, and only the function getValue_c can be invoked.
expr2.getValue_c(myData)
The following function extracts the names of the parameters apprearing in the expression
expr2.setOfBetas(free=True,fixed=True)
The list of parameters can also be obtained in the form of a dictionary.
expr2.dictOfBetas(free=True,fixed=True)
The list of variables can also be obtained in the form of a dictionary
expr2.dictOfVariables()
or a set...
expr2.setOfVariables()
Expressions are defined recursively, using a tree representation. The following function describes the type of the upper most node of the tree.
expr2.getClassName()
The signature is a formal representation of the expression, assigning identifiers to each node of the tree, and representing them starting from the leaves. It is easy to parse, and is passed to the C++ implementation.
expr2.getSignature()
The elementary expressions are
The following function extracts all elementary expressions from a list of formulas, give them a unique numbering, and return them organized by group, as defined above (with the exception of the variables, that are directly available in the database).
collectionOfFormulas = [expr1,expr2]
elementaryExpressionIndex,allFreeBetas,freeBetaNames,allFixedBetas,fixedBetaNames,allRandomVariables,randomVariableNames,allDraws,drawNames = ex.defineNumberingOfElementaryExpressions(collectionOfFormulas,list(myData.data.columns))
Unique numbering for all elementary expressions
elementaryExpressionIndex
allFreeBetas
Each elementary expression has two ids. One unique across all elementary expressions, and one unique within each specific group
[(i.uniqueId,i.betaId) for k,i in allFreeBetas.items()]
freeBetaNames
allFixedBetas
[(i.uniqueId,i.betaId) for k,i in allFixedBetas.items()]
fixedBetaNames
allRandomVariables
Monte Carlo integration is based on draws.
myDraws = ex.bioDraws('myDraws','UNIFORM')
expr3 = ex.MonteCarlo(myDraws*myDraws)
print(expr3)
Note that draws are not random variables, used for numerical integration.
expr3.dictOfRandomVariables()
The following function reports the draws involved in an expression.
expr3.dictOfDraws()
The expression is a Monte-Carlo integration.
expr3.getClassName()
Here is its value. It is an approximation of $\int_0^1 x^2 dx=\frac{1}{3}$.
expr3.getValue_c(myData,numberOfDraws=100000)
Here is its signature.
expr3.getSignature()
The same integral can be calculated using numerical integration, declaring a random variable.
omega = ex.RandomVariable('omega')
Numerical integration calculates integrals between $-\infty$ and $+\infty$. Here, the interval being $[0,1]$, a change of variables is required.
a = 0
b = 1
x = a + (b-a) / ( 1 + ex.exp(-omega))
dx = (b-a) * ex.exp(-omega) * (1+ex.exp(-omega))**(-2)
integrand = x * x
expr4 = ex.Integrate(integrand * dx /(b-a),'omega')
In this case, omega is a random variable.
expr4.dictOfRandomVariables()
print(expr4)
Calculating its value requires the C++ implementation.
expr4.getValue_c(myData)
We illustrate now the Elem function. It takes two arguments: a dictionary, and a formula for the key. For each entry in the database, the formula is evaluated, and its result identifies which formula in the dictionary should be evaluated. Here is 'Person' is 1, the expression is $$e_1=2 \beta_1 - \frac{\exp(-\beta_2)}{\beta_3 (\beta_2 \geq \beta_1)},$$ and if 'Person' is 2, the expression is $$e_2=2 \beta_1 V_1 - \frac{\exp(-\beta_2 V_2) }{ \beta_3 (\beta_2 \geq \beta_1)}.$$ As it is a regular expression, it can be included in any formula. Here, we illustrate it by dividing the result by 10.
elemExpr = ex.Elem({1:expr1,2:expr2},Person)
expr5 = elemExpr / 10
print(expr5)
expr5.dictOfVariables()
expr5.getValue_c(myData)
The next expression is simply the sum of multiples expressions. The argument is a list of expressions.
expr6 = ex.bioMultSum([expr1,expr2,expr4])
print(expr6)
expr6.getValue_c(myData,100000)
We now illustrate how to calculate a logit model, that is $$ \frac{y_1 e^{V_1}}{y_0 e^{V_0}+y_1 e^{V_1}+y_2 e^{V_2}}$$ where $V_0=-\beta_1$, $V_1=-\beta_2$ and $V_2=-\beta_1$, and $y_i = 1$, $i=1,2,3$.
V = {0:-beta1,1:-beta2,2:-beta1}
av = {0:1,1:1,2:1}
expr7 = ex.LogLogit(V,av,1)
expr7.getValue()
It is actually better to use the C++ implementation, availablr in the module models
import biogeme.models as models
expr8 = models.loglogit(V,av,1)
expr8.getValue_c(myData)
As the result is a numpy array, it can be used for any calculation. Here, we show how to calculate the logsum
for v in V.values():
print(v.getValue_c(myData))
logsum = np.log(np.sum([np.exp(v.getValue_c(myData)) for v in V.values()],axis=1))
logsum
It is possible to calculate the derivative of a formula with respect to a literal: $$e_9=\frac{\partial e_8}{\partial \beta_2}.$$
expr9 = ex.Derive(expr8,'beta2')
expr9.getValue_c(myData)
Biogeme also provides an approximation of the CDF of the normal distribution: $$e_{10}= \frac{1}{{\sigma \sqrt {2\pi } }}\int_{-\infty}^t e^{{{ - \left( {x - \mu } \right)^2 } \mathord{\left/ {\vphantom {{ - \left( {x - \mu } \right)^2 } {2\sigma ^2 }}} \right. } {2\sigma ^2 }}}dx$$
expr10 = ex.bioNormalCdf(Variable1/10-1)
expr10.getValue_c(myData)
Min and max operators are also available. To avoid any ambiguity with the Python operator, they are called bioMin and bioMax.
expr11 = ex.bioMin(expr5,expr10)
expr11.getValue_c(myData)
expr12 = ex.bioMax(expr5,expr10)
expr12.getValue_c(myData)
For the sake of efficiency, it is possible to specify explicitly a linear function, where each term is the product of a parameter and a variable.
terms = [(beta1,ex.Variable('Variable1')),(beta2,ex.Variable('Variable2')),(beta3,ex.Variable('newvar'))]
expr13 = ex.bioLinearUtility(terms)
expr13.getValue_c(myData)
In terms of specification, it is equivalent to the expression below. But the calculation of the derivates is more efficient, as the linear structure of the specification is exploited.
expr13bis = beta1 * Variable1 + beta2 * Variable2 + beta3 * newvar
expr13bis.getValue_c(myData)
A Pythonic way to write a linear utility function
variables = ['v1','v2','v3','cost','time','headway']
coefficients = {f'{v}':ex.Beta(f'beta_{v}',0,None,None,0) for v in variables}
terms = [coefficients[v] * ex.Variable(v) for v in variables]
util = sum(terms)
print(util)
The Python library communicates the expressions to the C++ library using a syntax called a "signature". We describe and illustrate now the signature for each expression. Each expression is identified by an identifier provided by Python using the function 'id'.
id(expr1)
< Numeric >{identifier},value I DO NOT KNOW HOW TO GET RID OF THE WHITE SPACES
ex.Numeric(0).getSignature()
< Beta >{identifier}"name"[status],uniqueId,betaId' where
beta1.getSignature()
beta3.getSignature()
< Variable >{identifier}"name",uniqueId,variableId where
Variable1.getSignature()
< RandomVariable >{identifier}"name",uniqueId,randomVariableId where
omega.getSignature()
< bioDraws >{identifier}"name",uniqueId,drawId where
myDraws.getSignature()
< operator >{identifier}(numberOfChildren),idFirstChild,idSecondChild,idThirdChild, etc... where the number of identifiers given after the comma matches the reported number of children.
Specific examples are reported below.
< operator >{identifier}(2),idFirstChild,idSecondChild where operator is one of:
- 'Plus'
- 'Minus'
- 'Times'
- 'Divide'
- 'Power'
- 'bioMin'
- 'bioMax'
- 'And'
- 'Or'
- 'Equal'
- 'NotEqual'
- 'LessOrEqual'
- 'GreaterOrEqual'
- 'Less'
- 'Greater'
sum = beta1 + Variable1
sum.getSignature()
< operator >{identifier}(1),idChild, where operator is one of:
- 'UnaryMinus'
- 'MonteCarlo'
- 'bioNormalCdf'
- 'PanelLikelihoodTrajectory'
- 'exp'
- 'log'
m = -beta1
m.getSignature()
< LogLogit >{identifier}(nbrOfAlternatives),chosenAlt,altNumber,utility,availability,altNumber,utility,availability, etc.
expr7.getSignature()
< Derive >{identifier},id of expression to derive,unique index of elementary expression
expr9.getSignature()
< Integrate >{identifier},id of expression to derive,index of random variable
expr4.getSignature()
< Elem >{identifier}(numberOfExpressions),keyId,value1,expression1,value2,expression2, etc...
where
elemExpr.getSignature()