Biogeme

Biogeme users' group

If you need help, submit your questions to the users' group:

groups.google.com/d/forum/biogeme

The forum is moderated. Please keep the following in mind before posting a question:

Check that the same question has not already been addressed on the forum.
Try to submit only questions about the software.
Make sure to read completely the documentation and to try the examples before submitting a question.
Do not submit large files (typically, data files) to the forum.

Important notice: the discussion group on Yahoo! is now obsolete. Although it is not closed, it will not be active anymore. Use the above mentioned Google forum instead.

Frequently Asked Questions

Why is the file headers.py not generated?
What initial values should I select for the parameters?
Can I save intermediate iterations during the estimation?
Does Biogeme provide support for out-of-sample validation?
The init loglikelihood is -1.797693e+308 and no iteration is performed. What should I do?

Why is the file headers.py not generated?

In order to comply better with good programming practice in Python, the syntax to import the variable names from the data file has been modified since version 3.2.5. The file headers.py is not generated anymore. The statement

from headers import *

must be replaced by

globals().update(database.variables)

where database is the object containing the database, created as follows:

import biogeme.database as db df = pd.read_csv("swissmetro.dat",'\t') database = db.Database("swissmetro",df)

Moreover, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:

from biogeme.expressions import Beta, DefineVariable, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log

Note that it is also possible to import all of them using the following syntax

from biogeme.expressions import *

although this is not a good Python programming practice.

What initial values should I select for the parameters?

If you have the results of a previous estimation, it may be a good idea to use the estimated values as a starting point for the estimation of a similar models. If not, it depends on the nature of the parameters:

If the parameter is a coefficient (traditionally denoted by β), the value 0 is appropriate.
If the parameter is a nest parameter of a nested or cross-nested logit model (traditionally denoted by μ), the value 1 is appropriate. Make sure to define the lower bound of the parameter to 1.
If the parameter is the nest membership coefficient of a cross-nested logit model (traditionally denoted by α), the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
If the parameter captures the membership to a class of a latent class model, the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
If the parameter is the scale of an error component in a mixture of logit model (traditionally denoted by σ), the value must be sufficient large so that the likelihood of each observation is not too close to zero. It is suggested to try first with the value one. If there are numerical issues, try a larger value, such as 10. See Section 7 in the report "Estimating choice models with latent variables with PandasBiogeme" for a detailed discussion.

Can I save intermediate iterations during the estimation?

Yes. Use the parameter saveIterations in the estimate function (see documentation here). Each time the value of the log likelihood function is improved, Biogeme will save the current value of all parameters in a file. The name of the file is defined by the parameter file_iterations. By default, it is __savedIterations.txt. If you want to start the iterations from the values in the file, call the function loadSavedIteration (documentation here) before starting the estimation.

Example:

	    biogeme = bio.BIOGEME(database, logprob)
	    fname = "myIterations.log"
	    biogeme.loadSavedIteration(filename=fname)
	    results = biogeme.estimate(saveIterations=True, file_iterations=fname)

Does Biogeme provide support for out-of-sample validation?

Yes.

You can extract a random sample of the database (see documentation).
You can split your database into slices, and generate estimation and validation data sets (see documentation ).
You can use the validate function (see documentation) to perform a full validation. The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slice is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises. See example 04validation.py from here.



      
	The init loglikelihood is -1.797693e+308
 and no iteration is performed. What should I do?
	
	  If the model returns a probability 0 for the chosen
	  alternative for at least one observation in the sample, then
	  the likelihood is 0, and the log likelihood is minus
	  infinity. For the sake of robustness, Biogeme assigns the
	  value -1.797693e+308 to the log likelihood in
	  this context. 
	  
	  A common reason why the model returns a probability 0 for
	    the chosen alternative is when it is declared
	    unavailable. The following code allows to verify that the
	    availability conditions are compatible with the choice
	    variable:
	    
	      diagnostic = database.checkAvailabilityOfChosenAlt(av, CHOICE)

	      if not diagnostic.all():
	          row_indices = np.where(diagnostic == False)[0]
	          print(f'Rows where the chosen alternative is not available: {row_indices}')
	      
	    
	    See the documentation
	    of checkAvailabilityOfChosenAlt here.
	  
	  Another reason is when the initial value of a scale
	  parameter is too close to zero. See the discussion here.
	  
	  But there are many other possible reasons. The best way
	  to investigate the source of the problem is to use Biogeme
	  in simulation mode, and report the probability of the chosen
	  alternative for each observation. Once you have identified
	  the problematic entries, it is easier to investigate the
	    reason why the model returns a probability of zero.

Getting help