This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit biogeme.epfl.ch.
import datetime
print(datetime.datetime.now())
import biogeme.version as ver
print(ver.getText())
import numpy as np
import pandas as pd
import biogeme.database as db
import biogeme.models as models
import matplotlib.pyplot as plt
df = pd.DataFrame({'Person':[1,1,1,2,2],
'Exclude':[0,0,1,0,1],
'Variable1':[1,2,3,4,5],
'Variable2':[10,20,30,40,50],
'Choice':[1,2,3,1,2],
'Av1':[0,1,1,1,1],
'Av2':[1,1,1,1,1],
'Av3':[0,1,1,1,1]})
myData = db.Database('test',df)
from biogeme.expressions import *
A piecewise linear specification (sometimes called 'spline') is a continuous but not differentiable function of the variable. It is defined based on thresholds. Between two thresholds, the function is linear. And the slope is changing after each threshold. Consider a variable $t$ and an interval $[a,a+b]$. We define a new variable $$ x_{[a,b]}(t) = \max(0,\min(t-a,b)) = \left\{ \begin{array}{ll} 0 & \text{if } t < a, \\ t-a & \text{if } a \leq t < a+b, \\ b & \text{otherwise}. \end{array} \right. $$ For each interval $]-\infty,a]$, we have $$ x_{]-\infty,a]}(t) = \min(t,a) = \left\{ \begin{array}{ll} t & \text{if } t < a, \\ a & \text{otherwise}. \end{array} \right.. $$ For each interval $[a,+\infty[$, we have $$ x_{]-\infty,a]}(t) = \max(0,t-a) = \left\{ \begin{array}{ll} 0& \text{if } t < a, \\ t-a & \text{otherwise}. \end{array} \right.. $$ If we consider a series of threshold $$\alpha_1 < \alpha_2 < \ldots <\alpha_K,$$ the piecewise linear transform of variable $t$ is $$ \sum_{k=1}^{K-1} \beta_k x_{[\alpha_k,\alpha_{k+1}]},$$ where $\beta_k$ is the slope of the linear function in interval $[\alpha_k,\alpha_{k+1}]$.
The next statement generates the variables, given the thresholds. A 'None' is equivalent to $\infty$, and can only appear first (and it means $-\infty$) or last (and it means $+\infty$).
x = Variable('x')
thresholds = [None,90,180,270,None]
variables = models.piecewiseVariables(x,thresholds)
print(variables)
The next statement automatically generates the formula, including the Beta parameters, that are initialized to zero.
formula = models.piecewiseFormula(x,thresholds)
print(formula)
It is also possible to initialize the Beta parameters with other values.
betas = [-0.016806308,-0.010491137,-0.002012234,-0.020051303]
formula = models.piecewiseFormula(x,thresholds,betas)
print(formula)
We provide a plot of a piecewise linear specification.
X = np.arange(0,300,0.1)
Y = [models.piecewiseFunction(x,thresholds,[-0.016806308,-0.010491137,-0.002012234,-0.020051303]) for x in X]
plt.plot(X,Y)
V = {1:Variable('Variable1'), 2:0.1, 3:-0.1}
av = {1:1, 2:0, 3:1}
Calculation of the (log of the) logit for the three alternatives, based on their availability.
p1 = models.logit(V,av,1)
p1.getValue_c(myData)
p1 = models.loglogit(V,av,1)
p1.getValue_c(myData)
p2 = models.logit(V,av,2)
p2.getValue_c(myData)
p2 = models.loglogit(V,av,2)
p2.getValue_c(myData)
p3 = models.logit(V,av,3)
p3.getValue_c(myData)
p3 = models.loglogit(V,av,3)
p3.getValue_c(myData)
Calculation of the log of the logit for the three alternatives, assuming that they are all available.
pa1 = models.logit(V,av=None,i=1)
pa1.getValue_c(myData)
pa1 = models.loglogit(V,av=None,i=1)
pa1.getValue_c(myData)
pa2 = models.logit(V,av=None,i=2)
pa2.getValue_c(myData)
pa2 = models.loglogit(V,av=None,i=2)
pa2.getValue_c(myData)
pa3 = models.logit(V,av=None,i=3)
pa3.getValue_c(myData)
pa3 = models.loglogit(V,av=None,i=3)
pa3.getValue_c(myData)
The Box-Cox transform of a variable $x$ is defined as $$B(x,\ell) = \frac{x^{\ell}-1}{\ell},$$ where $\ell > 0$ is a parameter that can be estimated from data. It has the property that $$\lim_{\ell \to 0} B(x,\ell)=\log(x).$$
x = Variable('Variable1')
models.boxcox(x,4)
x = Variable('Variable1')
models.boxcox(x,0)
l = Variable('Variable2')
e = models.boxcox(x,l)
print(e)
e.getValue_c(myData)
for l in range(1,16):
print(f'l=l0^(-{l}): {models.boxcox(3,10**-l)} - {np.log(3)} = {models.boxcox(3,10**-l) - np.log(3)}')
MEV models are defined as $$\frac{e^{V_i + \ln G_i(e^{V_1},\ldots,e^{V_J})}}{\sum_j e^{V_j + \ln G_j(e^{V_1},\ldots,e^{V_J})}},$$ where $G$ is a generating function, and $$G_i=\frac{\partial G}{\partial y_i}(e^{V_1},\ldots,e^{V_J})$$
The $G$ function for the nested logit model is defined such that $$G_i=\frac{\partial G}{\partial y_i}(e^{V_1},\ldots,e^{V_J}) = \mu e^{(\mu_m-1)V_i} \left(\sum_{i=1}^{J_m} e^{\mu_m V_i}\right)^{\frac{\mu}{\mu_m}-1},$$ where the choice set is partitioned into $J_m$ nests, each associated with a parameter $\mu_m$, and $\mu$ is the scale parameter. The condition is $0 \leq \mu \leq \mu_m$ must be verified. In general, $\mu$ is normalized to 1.0.
V = {1:Variable('Variable1'), 2:0.1, 3:-0.1, 4:-0.2, 5:0.2 }
av = {1:1, 2:0, 3:1, 4:1, 5:1}
nestA = 1.2, [1,2,4]
nestB = 2.3, [3,5]
p1 = models.nested(V,availability=av,nests=(nestA,nestB),choice=1)
p1.getValue_c(myData)
If all the alternatives are available, define the availability dictionary as None.
p1 = models.nested(V,availability=None,nests=(nestA,nestB),choice=1)
p1.getValue_c(myData)
p2 = models.lognested(V,availability=av,nests=(nestA,nestB),choice=1)
p2.getValue_c(myData)
p2 = models.lognested(V,availability=None,nests=(nestA,nestB),choice=1)
p2.getValue_c(myData)
If the value of the parameter $\mu$ is not 1, there is another function to call. Note that, for the sake of computational efficiency, it is not verified by the code if the condition $$0 \leq \mu \leq \mu_m$$ is verified.
p1 = models.nestedMevMu(V,availability=av,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
p1 = models.nestedMevMu(V,availability=None,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
p1 = models.lognestedMevMu(V,availability=av,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
p1 = models.lognestedMevMu(V,availability=None,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
The validity of the nested structure can be verified.
models.checkValidityNestedLogit(V,(nestA,nestB))
If one alternative does not belong to any nest...
nestA = 1.2, [1,4]
nestB = 2.3, [3,5]
models.checkValidityNestedLogit(V,(nestA,nestB))
If an alternative belongs to two nests
nestA = 1.2, [1,2,3,4]
nestB = 2.3, [3,5]
models.checkValidityNestedLogit(V,(nestA,nestB))
The $G$ function for the cross nested logit model is defined such that $$G_i=\frac{\partial G}{\partial y_i}(e^{V_1},\ldots,e^{V_J}) = \mu \sum_{m=1}^{M} \alpha_{im}^{\frac{\mu_m}{\mu}} e^{(\mu_m-1) V_i}\left( \sum_{j=1}^{J} \alpha_{jm}^{\frac{\mu_m}{\mu}} e^{\mu_m V_j} \right)^{\frac{\mu}{\mu_m}-1},$$ where each nest $m$ is associated with a parameter $\mu_m$ and, for each alternative $i$, a parameter $\alpha_{im} \geq 0$ that captures the degree of membership of alternative $i$ to nest $m$. $\mu$ is the scale parameter. For each alternative $i$, there must be at least one nest $m$ such that $\alpha_{im}>0$. The condition is $0 \leq \mu \leq \mu_m$ must be also verified. In general, $\mu$ is normalized to 1.0.
V = {1:Variable('Variable1'), 2:0.1, 3:-0.1, 4:-0.2, 5:0.2 }
av = {1:1, 2:0, 3:1, 4:1, 5:1}
alphaA = {1:1, 2:1, 3:0.5, 4:0, 5:0}
alphaB = {1:0, 2:0, 3:0.5, 4:1, 5:1}
nestA = 1.2, alphaA
nestB = 2.3, alphaB
p1 = models.cnl(V,availability=av,nests=(nestA,nestB),choice=1)
p1.getValue_c(myData)
If all the alternatives are available, define the availability dictionary as None.
p1 = models.cnl(V,availability=None,nests=(nestA,nestB),choice=1)
p1.getValue_c(myData)
If the value of the parameter $\mu$ is not 1, there is another function to call. Note that, for the sake of computational efficiency, it is not verified by the code if the condition $$0 \leq \mu \leq \mu_m$$ is verified.
p1 = models.cnlmu(V,availability=av,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
p1 = models.cnlmu(V,availability=None,nests=(nestA,nestB),choice=1,mu=1.1)
p1.getValue_c(myData)
If the sample is endogenous, a correstion must be included in the model, as proposed by Bierlaire, Bolduc and McFadden (2008). In this case, the generating function must first be defined, and the MEV model with correction is then called.
logGi = models.getMevForCrossNested(V,availability=av,nests=(nestA,nestB))
logGi
correction = {1:-0.1, 2:0.1, 3:0.2, 4:-0.2, 5:0}
p1 = models.mev_endogenousSampling(V,logGi,av,correction,choice=1)
p1.getValue_c(myData)
correction = {1:-0.1, 2:0.1, 3:0.2, 4:-0.2, 5:0}
p1 = models.logmev_endogenousSampling(V,logGi,av,correction,choice=1)
p1.getValue_c(myData)
correction = {1:-0.1, 2:0.1, 3:0.2, 4:-0.2, 5:0}
p1 = models.mev_endogenousSampling(V,logGi,av=None,correction=correction,choice=1)
p1.getValue_c(myData)
correction = {1:-0.1, 2:0.1, 3:0.2, 4:-0.2, 5:0}
p1 = models.logmev_endogenousSampling(V,logGi,av=None,correction=correction,choice=1)
p1.getValue_c(myData)
The MEV generating function for the following models are available.
Nested logit model
V = {1:Variable('Variable1'), 2:0.1, 3:-0.1, 4:-0.2, 5:0.2 }
av = {1:1, 2:0, 3:1, 4:1, 5:1}
nestA = Beta('muA',1.2,1.0,None,0), [1,2,4]
nestB = Beta('muB',2.3,1.0,None,0), [3,5]
logGi = models.getMevForNested(V,availability=None,nests=(nestA,nestB))
logGi
And with the $\mu$ parameter
logGi = models.getMevForNestedMu(V,availability=None,nests=(nestA,nestB),mu=1.1)
logGi
Cross nested logit model
V = {1:Variable('Variable1'), 2:0.1, 3:-0.1, 4:-0.2, 5:0.2 }
av = {1:1, 2:0, 3:1, 4:1, 5:1}
alphaA = {1:1, 2:1, 3:0.5, 4:0, 5:0}
alphaB = {1:0, 2:0, 3:0.5, 4:1, 5:1}
nestA = Beta('muA',1.2,1.0,None,0), alphaA
nestB = Beta('muB',2.3,1.0,None,0), alphaB
logGi = models.getMevForCrossNested(V,availability=None,nests=(nestA,nestB))
logGi
Cross nested logit model with $\mu$ parameter
logGi = models.getMevForCrossNestedMu(V,availability=None,nests=(nestA,nestB),mu=1.1)
logGi