This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit biogeme.epfl.ch.
import datetime
print(datetime.datetime.now())
import biogeme.version as ver
print(ver.getText())
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import *
np.random.seed(90267)
df = pd.DataFrame({'Person':[1,1,1,2,2],'Exclude':[0,0,1,0,1],'Variable1':[1,2,3,4,5],'Variable2':[10,20,30,40,50],'Choice':[1,2,3,1,2],'Av1':[0,1,1,1,1],'Av2':[1,1,1,1,1],'Av3':[0,1,1,1,1]})
myData = db.Database('test',df)
print(myData)
Evaluates an expression for each entry of the database.
Args:
expression: object of type biogeme.expressions.
Returns:
numpy series, long as the number of entries in the database, containing the calculated quantities.
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
expr = Variable1 + Variable2
result = myData.valuesFromDatabase(expr)
print(result)
Check if the chosen alternative is available for each entry in the database.
Args:
avail: list of biogeme.expressions to evaluate the availability conditions for each alternative.
choice: biogeme.expressions to evaluate the chosen alternative.
Returns:
numpy series of bool, long as the number of entries in the database, containing True is the chosen alternative is available, False otherwise.
Av1=Variable('Av1')
Av2=Variable('Av2')
Av3=Variable('Av3')
Choice=Variable('Choice')
avail = {1:Av1,2:Av2,3:Av3}
result = myData.checkAvailabilityOfChosenAlt(avail,Choice)
print(result)
Calculates the value of an expression for each entry in the database, and retturns the sum.
Args:
expression: object of type biogeme.expressions
Returns:
Sum of the expressions over the database.
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
expression = Variable2 / Variable1
result = myData.sumFromDatabase(expression)
print(result)
Suggest a scaling of the variables in the database
Returns:
A Pandas dataframe where each row contains the name of
the variable and the suggested scale s. Ideally, the column
should be multiplied by s.
myData.suggestScaling()
Divide an entire column by a scale value
Args:
column: name of the column
scale: value of the scale. All values of the column will
be multiplied by that scale.
myData.data
myData.scaleColumn('Variable2',0.01)
myData.data
Add a new column in the database, calculated from an expression.
Args:
expression: object of type biogeme.expressions describing the expression to evaluate
column: name of the column to add.
Returns:
nothing
Raises:
ValueError: if the column name already exists.
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
expression = exp(0.5*Variable2) / Variable1
expression = Variable2 * Variable1
result = myData.addColumn(expression,'NewVariable')
print(myData.data['NewVariable'].tolist())
Counts the number of observations that have a specific value in a given column.
Args:
columnName: name of the column.
value: value that is seeked.
Returns:
Number of times that the value appears in the column.
# Count the number of entries for individual 1.
myData.count('Person',1)
Removes from the database all entries such that the value of the expression is not 0.
Args:
expression: object of type biogeme.expressions describing the expression to evaluate
Returns:
Nothing.
Exclude=Variable('Exclude')
myData.remove(Exclude)
myData.data
Dumps the database in a CSV formatted file.
Returns: name of the file
myData.dumpOnFile()
%%bash
cat test_dumped.dat
Generate draws for each variable.
Args:
types:
A dict indexed by the names of the variables,
describing the types of draws. Each of them can be a
native type or any type defined by the function
database.setRandomNumberGenerators
names:
the list of names of the variables that require draws to be generated.
numberOfDraws:
number of draws to generate.
Returns:
a 3-dimensional table with draws. The 3 dimensions are
1. number of individuals
2. number of draws
3. number of variables
List native types and their description
myData.descriptionOfNativeDraws()
randomDraws1 = bioDraws('randomDraws1','NORMAL_MLHS_ANTI')
randomDraws2 = bioDraws('randomDraws2','UNIFORM_MLHS_ANTI')
randomDraws3 = bioDraws('randomDraws3','UNIFORMSYM_MLHS_ANTI')
# We build an expression that involves the three random variables
x = randomDraws1 + randomDraws2 + randomDraws3
types = x.dictOfDraws()
print(types)
theDrawsTable = myData.generateDraws(types,
['randomDraws1','randomDraws2','randomDraws3'],
10)
theDrawsTable
Defines user-defined random numbers generators.
Args:
rng: a dictionary of generators. The keys of the dictionary
characterize the name of the generators, and must be
different from the pre-defined generators in Biogeme:
NORMAL, UNIFORM and UNIFORMSYM. The elements of the
dictionary are functions that take two arguments: the
number of series to generate (typically, the size of the
database), and the number of draws per series.
Returns:
nothing.
# We first define functions returning draws, given the number of observations, and the number of draws
def logNormalDraws(sampleSize,numberOfDraws):
return np.exp(np.random.randn(sampleSize,numberOfDraws))
def exponentialDraws(sampleSize,numberOfDraws):
return -1.0 * np.log(np.random.rand(sampleSize,numberOfDraws))
# We associate these functions with a name
dict = {'LOGNORMAL':(logNormalDraws,'Draws from lognormal distribution'),'EXP':(exponentialDraws,'Draws from exponential distributions')}
myData.setRandomNumberGenerators(dict)
# We can now generate draws from these distributions
randomDraws1 = bioDraws('randomDraws1','LOGNORMAL')
randomDraws2 = bioDraws('randomDraws2','EXP')
x = randomDraws1 + randomDraws2
types = x.dictOfDraws()
theDrawsTable = myData.generateDraws(types,['randomDraws1','randomDraws2'],10)
print(theDrawsTable)
Extract a random sample from the database, with replacement. Useful for bootstrapping. Args: size: size of the sample. If None, a sample of the same size as the database will be generated.
Returns:
pandas dataframe with the sample.
myData.sampleWithReplacement()
myData.sampleWithReplacement(6)
Defines the data as panel data
Args:
columnName: name of the columns that identifies individuals.
myPanelData = db.Database('test',df)
# Data is not considered panel yet
myPanelData.isPanel()
myPanelData.panel('Person')
# Now it is panel
print(myPanelData.isPanel())
print(myPanelData)
When draws are generated for panel data, a set of draws is generated per person, not per observation.
randomDraws1 = bioDraws('randomDraws1','NORMAL')
randomDraws2 = bioDraws('randomDraws2','UNIFORM_HALTON3')
# We build an expression that involves the two random variables
x = randomDraws1 + randomDraws2
types = x.dictOfDraws()
theDrawsTable = myPanelData.generateDraws(types,['randomDraws1','randomDraws2'],10)
print(theDrawsTable)
Reports the number of observations in the database. Note that it returns the same value, irrespectively if the database contains panel data or not.
Returns:
Number of observations.
See: getSampleSize()
myData.getNumberOfObservations()
myPanelData.getNumberOfObservations()
Reports the size of the sample. If the data is cross-sectional, it is the number of observations in the database. If the data is panel, it is the number of individuals.
Returns:
Sample size.
See: getNumberOfObservations()
myData.getSampleSize()
myPanelData.getSampleSize()
Extract a random sample of the individual map from a panel data database, with replacement. Useful for bootstrapping.
Args:
size: size of the sample. If None, a sample of the same size as the database will be generated.
Returns:
pandas dataframe with the sample.
myPanelData.sampleIndividualMapWithReplacement(10)