Biogeme

Biogeme users' group

If you need help, submit your questions to the users' group:

groups.google.com/d/forum/biogeme

The forum is moderated. Please keep the following in mind before posting a question:

Check that the same question has not already been addressed on the forum.
Try to submit only questions about the software.
Make sure to read completely the documentation and to try the examples before submitting a question.
Do not submit large files (typically, data files) to the forum.

Important notice: the discussion group on Yahoo! is now obsolete. Although it is not closed, it will not be active anymore. Use the above mentioned Google forum instead.

Frequently Asked Questions

Why is the file headers.py not generated?
What initial values should I select for the parameters?
Can I save intermediate iterations during the estimation?
Does Biogeme provide support for out-of-sample validation?
The init loglikelihood is -1.797693e+308 and no iteration is performed. What should I do?

Why is the file headers.py not generated?

In order to comply better with good programming practice in Python, the syntax to import the variable names from the data file has been modified since version 3.2.5. The file headers.py is not generated anymore. The statement

from headers import *

must be replaced by

globals().update(database.variables)

where database is the object containing the database, created as follows:

import biogeme.database as db df = pd.read_csv("swissmetro.dat",'\t') database = db.Database("swissmetro",df)

Moreover, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:

from biogeme.expressions import Beta, DefineVariable, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log

Note that it is also possible to import all of them using the following syntax

from biogeme.expressions import *

although this is not a good Python programming practice.

What initial values should I select for the parameters?

If you have the results of a previous estimation, it may be a good idea to use the estimated values as a starting point for the estimation of a similar models. If not, it depends on the nature of the parameters:

If the parameter is a coefficient (traditionally denoted by β), the value 0 is appropriate.
If the parameter is a nest parameter of a nested or cross-nested logit model (traditionally denoted by μ), the value 1 is appropriate. Make sure to define the lower bound of the parameter to 1.
If the parameter is the nest membership coefficient of a cross-nested logit model (traditionally denoted by α), the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
If the parameter captures the membership to a class of a latent class model, the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
If the parameter is the scale of an error component in a mixture of logit model (traditionally denoted by σ), the value must be sufficient large so that the likelihood of each observation is not too close to zero. It is suggested to try first with the value one. If there are numerical issues, try a larger value, such as 10. See Section 7 in the report "Estimating choice models with latent variables with PandasBiogeme" for a detailed discussion.

Can I save intermediate iterations during the estimation?

Yes. Actually, Biogeme does it automatically for you. If the name of your model is mymodel, Biogeme creates a file named __mymodel.iter, where the values of the parameters are saved after each successful iteration of the estimation algorithm. When the estimation starts, Biogeme checks the existence of this file. If it exists, the value stored are used as a starting point for the estimation. Set the attribute saveIterations of the BIOGEME object to False is you want to turn that feature off. [Click here for the documentation]

Does Biogeme provide support for out-of-sample validation?

Yes.

You first split the data into estimation and validation samples. [Click here for the documentation.]
You can use the validate function [Click here for the documentation] to perform a full validation. For each slice generated by the split, the model is re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. See example 04validation.py from here.



      
	The init loglikelihood is -1.797693e+308
 and no iteration is performed. What should I do?
	
	  If the model returns a probability 0 for the chosen
	  alternative for at least one observation in the sample, then
	  the likelihood is 0, and the log likelihood is minus
	  infinity. For the sake of robustness, Biogeme assigns the
	  value -1.797693e+308 to the log likelihood in
	  this context. 
	  
	  A common reason why the model returns a probability 0 for
	    the chosen alternative is when it is declared
	    unavailable. The following code allows to verify that the
	    availability conditions are compatible with the choice
	    variable:
	    
	      diagnostic = database.checkAvailabilityOfChosenAlt(av, CHOICE)

	      if not diagnostic.all():
	          row_indices = np.where(diagnostic == False)[0]
	          print(f'Rows where the chosen alternative is not available: {row_indices}')
	      
	    
	    See the documentation
	    of checkAvailabilityOfChosenAlt here.
	  
	  Another reason is when the initial value of a scale
	  parameter is too close to zero. See the discussion above.
	  
	  But there are many other possible reasons. The best way
	  to investigate the source of the problem is to use Biogeme
	  in simulation mode, and report the probability of the chosen
	  alternative for each observation. Once you have identified
	  the problematic entries, it is easier to investigate the
	    reason why the model returns a probability of zero.

Getting help