Getting help
groups.google.com/d/forum/biogeme
The forum is moderated. Please keep the following in mind before posting a question:- Check that the same question has not already been addressed on the forum.
- Try to submit only questions about the software.
- Make sure to read completely the documentation and to try the examples before submitting a question.
- Do not submit large files (typically, data files) to the forum.
- Why is the file headers.py not generated?
- What initial values should I select for the parameters?
- Can I save intermediate iterations during the estimation?
- Does Biogeme provide support for out-of-sample validation?
- The init loglikelihood is
-1.797693e+308
and no iteration is performed. What should I do?
In order to comply better with good programming practice in
Python, the syntax to import the variable names from the data
file has been modified since version 3.2.5. The file
headers.py
is not generated anymore. The statement
from headers import *
must be replaced by
globals().update(database.variables)
where database
is the object containing the
database, created as follows:
import biogeme.database as db
df = pd.read_csv("swissmetro.dat",'\t')
database = db.Database("swissmetro",df)
Moreover, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:
from biogeme.expressions import Beta, DefineVariable, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log
Note that it is also possible to import all of them using the following syntax
from biogeme.expressions import *
although this is not a good Python programming practice.
- If the parameter is a coefficient (traditionally denoted by β), the value 0 is appropriate.
- If the parameter is a nest parameter of a nested or cross-nested logit model (traditionally denoted by μ), the value 1 is appropriate. Make sure to define the lower bound of the parameter to 1.
- If the parameter is the nest membership coefficient of a cross-nested logit model (traditionally denoted by α), the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
- If the parameter captures the membership to a class of a latent class model, the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
- If the parameter is the scale of an error component in a mixture of logit model (traditionally denoted by σ), the value must be sufficient large so that the likelihood of each observation is not too close to zero. It is suggested to try first with the value one. If there are numerical issues, try a larger value, such as 10. See Section 7 in the report "Estimating choice models with latent variables with PandasBiogeme" for a detailed discussion.
Yes. Use the parameter saveIterations
in the
estimate
function (see
documentation here). Each time the value of the log
likelihood function is improved, Biogeme will save the
current value of all parameters in a file. The name of the
file is defined by the
parameter file_iterations
. By default, it
is __savedIterations.txt
.
If you want to start the iterations from the values in the
file, call the function loadSavedIteration
(documentation here)
before starting the estimation.
Example:
biogeme = bio.BIOGEME(database, logprob) fname = "myIterations.log" biogeme.loadSavedIteration(filename=fname) results = biogeme.estimate(saveIterations=True, file_iterations=fname)
Yes.
- You can extract a random sample of the database (see documentation).
- You can split your database into slices, and generate estimation and validation data sets (see documentation ).
- You can use the
validate
function (see documentation) to perform a full validation. The validation consists in organizing the data into several slices of about the same size, randomly defined. Each slice is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises. See example04validation.py from here.
-1.797693e+308
and no iteration is performed. What should I do?If the model returns a probability 0 for the chosen
alternative for at least one observation in the sample, then
the likelihood is 0, and the log likelihood is minus
infinity. For the sake of robustness, Biogeme assigns the
value -1.797693e+308
to the log likelihood in
this context.
A common reason why the model returns a probability 0 for the chosen alternative is when it is declared unavailable. The following code allows to verify that the availability conditions are compatible with the choice variable:
diagnostic = database.checkAvailabilityOfChosenAlt(av, CHOICE) if not diagnostic.all(): row_indices = np.where(diagnostic == False)[0] print(f'Rows where the chosen alternative is not available: {row_indices}')See the documentation of
checkAvailabilityOfChosenAlt
here.
Another reason is when the initial value of a scale parameter is too close to zero. See the discussion here.
But there are many other possible reasons. The best way to investigate the source of the problem is to use Biogeme in simulation mode, and report the probability of the chosen alternative for each observation. Once you have identified the problematic entries, it is easier to investigate the reason why the model returns a probability of zero.