Biogeme
The core routines of Biogeme.
biogeme.biogeme module
Implementation of the main Biogeme class
- author:
- Michel Bierlaire 
- date:
- Tue Mar 26 16:45:15 2019 
It combines the database and the model specification.
- class biogeme.biogeme.BIOGEME(database, formulas, userNotes=None, parameter_file=None, skip_audit=False, **kwargs)[source]
- Bases: - object- Main class that combines the database and the model
- specification. 
 - It works in two modes: estimation and simulation. - __init__(database, formulas, userNotes=None, parameter_file=None, skip_audit=False, **kwargs)[source]
- Constructor - Parameters:
- database ( - biogeme.database.Database) – choice data.
- formulas ( - biogeme.expressions.Expression, or dict(- biogeme.expressions.Expression)) – expression or dictionary of expressions that define the model specification. The concept is that each expression is applied to each entry of the database. The keys of the dictionary allow to provide a name to each formula. In the estimation mode, two formulas are needed, with the keys ‘loglike’ and ‘weight’. If only one formula is provided, it is associated with the label ‘loglike’. If no formula is labeled ‘weight’, the weight of each piece of data is supposed to be 1.0. In the simulation mode, the labels of each formula are used as labels of the resulting database.
- userNotes (str) – these notes will be included in the report file. 
- parameter_file (str) – name of the .toml file where the parameters are read 
 
- Raises:
- BiogemeError – an audit of the formulas is performed. If a formula has issues, an error is detected and an exception is raised. 
 
 - property algorithm_name
- Name of the optimization algorithm 
 - argument_warning()[source]
- Displays a deprecation warning when parameters are provided as arguments. 
 - bestIteration
- Store the best iteration found so far. 
 - beta_values_dict_to_list(beta_dict=None)[source]
- Transforms a dict with the names of the betas associated
- with their values, into a list consistent with the numbering of the ids. 
 - Parameters:
- beta_dict (dict(str: float)) – dict with the values of the parameters 
- Raises:
- BiogemeError – if the parameter is not a dict 
- BiogemeError – if a parameter is missing in the dict 
 
 
 - bootstrap_results
- Results of the bootstrap calculation. 
 - bootstrap_time
- Time needed to calculate the bootstrap standard errors 
 - calculateInitLikelihood()[source]
- Calculate the value of the log likelihood function - The default values of the parameters are used. - Returns:
- value of the log likelihood. 
- Return type:
- float. 
 
 - calculateLikelihood(x, scaled, batch=None)[source]
- Calculates the value of the log likelihood function - Parameters:
- x (list(float)) – vector of values for the parameters. 
- scaled (bool) – if True, the value is divided by the number of observations used to calculate it. In this case, the values with different sample sizes are comparable. Default: True 
- batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None 
 
- Returns:
- the calculated value of the log likelihood 
- Return type:
- float. 
- Raises:
- ValueError – if the length of the list x is incorrect. 
- BiogemeError – if calculatation with batch is requested 
 
 
 - calculateLikelihoodAndDerivatives(x, scaled, hessian=False, bhhh=False, batch=None)[source]
- Calculate the value of the log likelihood function and its derivatives. - Parameters:
- x (list(float)) – vector of values for the parameters. 
- scaled (bool) – if True, the results are devided by the number of observations. 
- hessian (bool) – if True, the hessian is calculated. Default: False. 
- bhhh (bool) – if True, the BHHH matrix is calculated. Default: False. 
- batch (float) – if not None, calculates the likelihood on a random sample of the data. The value of the parameter must be strictly between 0 and 1, and represents the share of the data that will be used. Default: None 
 
- Returns:
- f, g, h, bh where - f is the value of the function (float) 
- g is the gradient (numpy.array) 
- h is the hessian (numpy.array) 
- bh is the BHHH matrix (numpy.array) 
 
- Return type:
- tuple float, numpy.array, numpy.array, numpy.array 
- Raises:
- ValueError – if the length of the list x is incorrect 
- BiogemeError – if the norm of the gradient is not finite, an error is raised. 
- BiogemeError – if calculatation with batch is requested 
 
 
 - calculateNullLoglikelihood(avail)[source]
- Calculate the log likelihood of the null model that predicts equal probability for each alternative - Parameters:
- avail (list of - biogeme.expressions.Expression) – list of expressions to evaluate the availability conditions for each alternative. If None, all alternatives are always available.
- Returns:
- value of the log likelihood 
- Return type:
- float 
 
 - changeInitValues(betas)[source]
- Modifies the initial values of the pameters in all formula - Parameters:
- betas (dict(string:float)) – dictionary where the keys are the names of the parameters, and the values are the new value for the parameters. 
 
 - checkDerivatives(beta, verbose=False)[source]
- Verifies the implementation of the derivatives. - It compares the analytical version with the finite differences approximation. - Parameters:
- x (list(float)) – vector of values for the parameters. 
- verbose (bool) – if True, the comparisons are reported. Default: False. 
 
- Return type:
- tuple. 
- Returns:
- f, g, h, gdiff, hdiff where - f is the value of the function, 
- g is the analytical gradient, 
- h is the analytical hessian, 
- gdiff is the difference between the analytical and the finite differences gradient, 
- hdiff is the difference between the analytical and the finite differences hessian, 
 
 
 - confidenceIntervals(betaValues, intervalSize=0.9)[source]
- Calculate confidence intervals on the simulated quantities - Parameters:
- betaValues (list(dict(str: float))) – array of parameters values to be used in the calculations. Typically, it is a sample drawn from a distribution. 
- intervalSize (float) – size of the reported confidence interval, in percentage. If it is denoted by s, the interval is calculated for the quantiles (1-s)/2 and (1+s)/2. The default (0.9) corresponds to quantiles for the confidence interval [0.05, 0.95]. 
 
- Returns:
- two pandas data frames ‘left’ and ‘right’ with the same dimensions. Each row corresponds to a row in the database, and each column to a formula. ‘left’ contains the left value of the confidence interval, and ‘right’ the right value - Example: - # Read the estimation results from a file results = res.bioResults(pickleFile = 'myModel.pickle') # Retrieve the names of the betas parameters that have been # estimated betas = biogeme.freeBetaNames # Draw 100 realization of the distribution of the estimators b = results.getBetasForSensitivityAnalysis(betas, size = 100) # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(betaValues) # Calculate the confidence intervals for each formula left, right = biogeme.confidenceIntervals(b, 0.9) 
- Return type:
- tuple of two Pandas dataframes. 
 
 - database
- biogeme.database.Databaseobject
 - property dogleg
- getter for the parameter 
 - drawsProcessingTime
- Time needed to generate the draws. 
 - property enlarging_factor
- getter for the parameter 
 - estimate(recycle=False, bootstrap=0, **kwargs)[source]
- Estimate the parameters of the model(s). - Parameters:
- recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed. 
- bootstrap (int) – number of bootstrap resampling used to calculate the variance-covariance matrix using bootstrapping. If the number is 0, bootstrapping is not applied. Default: 0. 
 
- Returns:
- object containing the estimation results. 
- Return type:
- biogeme.bioResults 
 - Example: - # Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.estimate() - Raises:
- BiogemeError – if no expression has been provided for the likelihood 
 
 - estimate_catalog(selected_configurations=None, quick_estimate=False, recycle=False, bootstrap=0)[source]
- Estimate all or selected versions of a model with Catalog’s, corresponding to multiple specifications. - Parameters:
- selected_configurations – set of configurations. If 
 - None, all configurations are considered. :type selected_configurations: set(biogeme.pareto.SetElement) - Parameters:
- quick_estimate (bool) – if True, the final statistics are not calculated. 
- recycle (bool) – if True, the results are read from the pickle file, if it exists. If False, the estimation is performed. 
- bootstrap (int) – number of bootstrap resampling used to calculate the variance-covariance matrix using bootstrapping. If the number is 0, bootstrapping is not applied. Default: 0. 
 
- Returns:
- object containing the estimation results associated with the name of each specification, as well as a description of each configuration 
- Return type:
- dict(str: bioResults) 
 
 - files_of_type(extension, all_files=False)[source]
- Identify the list of files with a given extension in the local directory - Parameters:
- extension (str) – extension of the requested files (without the dot): ‘pickle’, or ‘html’ 
- all_files (bool) – if all_files is False, only files containing the name of the model are identified. If all_files is True, all files with the requested extension are identified. 
 
- Returns:
- list of files with the requested extension. 
- Return type:
- list(str) 
 
 - formulas
- Dictionary containing Biogeme formulas of type - biogeme.expressions.Expression. The keys are the names of the formulas.
 - freeBetaNames()[source]
- Returns the names of the parameters that must be estimated - Returns:
- list of names of the parameters 
- Return type:
- list(str) 
 
 - property generateHtml
- Boolean variable, True if the HTML file with the results must be generated. 
 - property generatePickle
- Boolean variable, True if the PICKLE file with the results must be generated. 
 - property generate_html
- Boolean variable, True if the HTML file with the results must be generated. 
 - property generate_pickle
- Boolean variable, True if the PICKLE file with the results must be generated. 
 - getBoundsOnBeta(betaName)[source]
- Returns the bounds on the parameter as defined by the user. - Parameters:
- betaName (string) – name of the parameter 
- Returns:
- lower bound, upper bound 
- Return type:
- tuple 
- Raises:
- BiogemeError – if the name of the parameter is not found. 
 
 - property identification_threshold
- Threshold for the eigenvalue to trigger an identification warning 
 - property infeasible_cg
- getter for the parameter 
 - initLogLike
- Init value of the likelihood function 
 - property initial_radius
- getter for the parameter 
 - lastSample
- keeps track of the sample of data used to calculate the stochastic gradient / hessian 
 - likelihoodFiniteDifferenceHessian(x)[source]
- Calculate the hessian of the log likelihood function using finite differences. - May be useful when the analytical hessian has numerical issues. - Parameters:
- x (list(float)) – vector of values for the parameters. 
- Returns:
- finite differences approximation of the hessian. 
- Return type:
- numpy.array 
- Raises:
- ValueError – if the length of the list x is incorrect 
 
 - loglike
- Object of type - biogeme.expressions.Expressioncalculating the formula for the loglikelihood
 - loglikeName
- Keyword used for the name of the loglikelihood formula. Default: ‘loglike’ 
 - loglikeSignatures
- Internal signature of the formula for the loglikelihood. 
 - property maximum_number_catalog_expressions
- Maximum number of multiple expressions when Catalog’s are used. 
 - property maxiter
- getter for the parameter 
 - property missingData
- Code for missing data 
 - property missing_data
- Code for missing data 
 - modelName
- Name of the model. Default: ‘biogemeModelDefaultName’ 
 - monteCarlo
- monteCarlois True if one of the expressions involves a Monte-Carlo integration.
 - nullLogLike
- Log likelihood of the null model 
 - property numberOfDraws
- Number of draws for Monte-Carlo integration. 
 - property numberOfThreads
- Number of threads used for parallel computing. Default: the number of available CPU. 
 - property number_of_draws
- Number of draws for Monte-Carlo integration. 
 - property number_of_threads
- Number of threads used for parallel computing. Default: the number of available CPU. 
 - property only_robust_stats
- True if only the robust statistics need to be reported. If False, the statistics from the Rao-Cramer bound are also reported. 
 - optimizationMessages
- Information provided by the optimization algorithm after completion. 
 - optimize(startingValues=None)[source]
- Calls the optimization algorithm. The function self.algorithm is called. - Parameters:
- startingValues (list(float)) – starting point for the algorithm 
- Returns:
- x, messages - x is the solution generated by the algorithm, 
- messages is a dictionary describing several information about the algorithm 
 
- Return type:
- numpay.array, dict(str:object) 
- Raises:
- BiogemeError – an error is raised if no algorithm is specified. 
 
 - quickEstimate(**kwargs)[source]
- Estimate the parameters of the model. Same as estimate, where any extra calculation is skipped (init loglikelihood, t-statistics, etc.)- Returns:
- object containing the estimation results. 
- Return type:
 Example: # Create an instance of biogeme biogeme = bio.BIOGEME(database, logprob) # Gives a name to the model biogeme.modelName = 'mymodel' # Estimate the parameters results = biogeme.quickEstimate() - Raises:
- BiogemeError – if no expression has been provided for the likelihood 
 
 - property saveIterations
- If True, the current iterate is saved after each iteration, in a file named - __[modelName].iter, where- [modelName]is the name given to the model. If such a file exists, the starting values for the estimation are replaced by the values saved in the file.
 - property save_iterations
- Same as saveIterations, with another syntax 
 - property second_derivatives
- getter for the parameter 
 - property seed_param
- getter for the parameter 
 - setRandomInitValues(defaultBound=100.0)[source]
- Modifies the initial values of the parameters in all formulas, using randomly generated values. The value is drawn from a uniform distribution on the interval defined by the bounds. - Parameters:
- defaultBound (float) – If the upper bound is missing, it is replaced by this value. If the lower bound is missing, it is replaced by the opposite of this value. Default: 100. 
 
 - short_names
 - simulate(theBetaValues)[source]
- Applies the formulas to each row of the database. - Parameters:
- theBetaValues (dict(str, float)) – values of the parameters to be used in the calculations. If None, the default values are used. Default: None. 
- Returns:
- a pandas data frame with the simulated value. Each row corresponds to a row in the database, and each column to a formula. 
- Return type:
- Pandas data frame 
 - Example: - # Read the estimation results from a file results = res.bioResults(pickleFile = 'myModel.pickle') # Simulate the formulas using the nominal values simulatedValues = biogeme.simulate(betaValues) - Raises:
- BiogemeError – if the number of parameters is incorrect 
- BiogemeError – if theBetaValues is None. 
 
 
 - property steptol
- getter for the parameter 
 - property tolerance
- getter for the parameter 
 - userNotes
- User notes 
 - validate(estimationResults, validationData)[source]
- Perform out-of-sample validation. - The function performs the following tasks: - each slice defines a validation set (the slice itself) and an estimation set (the rest of the data), 
- the model is re-estimated on the estimation set, 
- the estimated model is applied on the validation set, 
- the value of the log likelihood for each observation is reported. 
 - Parameters:
- estimationResults (biogeme.results.bioResults) – results of the model estimation based on the full data. 
- validationData (list(tuple(pandas.DataFrame, pandas.DataFrame))) – list of estimation and validation data sets 
 
- Returns:
- a list containing as many items as slices. Each item is the result of the simulation on the validation set. 
- Return type:
- list(pandas.DataFrame) 
- Raises:
- BiogemeError – An error is raised if the database is structured as panel data. 
 
 - weight
- Object of type - biogeme.expressions.Expressioncalculating the weight of each observation in the sample.
 - weightName
- Keyword used for the name of the weight formula. Default: ‘weight’ 
 - weightSignatures
- Internal signature of the formula for the weight.