Getting help
groups.google.com/d/forum/biogeme
The forum is moderated. Please keep the following in mind before posting a question:- Check that the same question has not already been addressed on the forum.
- Try to submit only questions about the software.
- Make sure to read completely the documentation and to try the examples before submitting a question.
- Do not submit large files (typically, data files) to the forum.
- Why is the file headers.py not generated?
- What initial values should I select for the parameters?
- Can I save intermediate iterations during the estimation?
- Does Biogeme provide support for out-of-sample validation?
- The init loglikelihood is
-1.797693e+308
and no iteration is performed. What should I do?
In order to comply better with good programming practice in
Python, the syntax to import the variable names from the data
file has been modified since version 3.2.5. The file
headers.py
is not generated anymore. The statement
from headers import *
must be replaced by
globals().update(database.variables)
where database
is the object containing the
database, created as follows:
import biogeme.database as db
df = pd.read_csv("swissmetro.dat",'\t')
database = db.Database("swissmetro",df)
Moreover, in order to avoid any ambiguity, the operators used by Biogeme must be explicitly imported. For instance:
from biogeme.expressions import Beta, DefineVariable, bioDraws, PanelLikelihoodTrajectory, MonteCarlo, log
Note that it is also possible to import all of them using the following syntax
from biogeme.expressions import *
although this is not a good Python programming practice.
- If the parameter is a coefficient (traditionally denoted by β), the value 0 is appropriate.
- If the parameter is a nest parameter of a nested or cross-nested logit model (traditionally denoted by μ), the value 1 is appropriate. Make sure to define the lower bound of the parameter to 1.
- If the parameter is the nest membership coefficient of a cross-nested logit model (traditionally denoted by α), the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
- If the parameter captures the membership to a class of a latent class model, the value 0.5 is appropriate. Make sure to define the lower bound to 0 and the upper bound to 1.
- If the parameter is the scale of an error component in a mixture of logit model (traditionally denoted by σ), the value must be sufficient large so that the likelihood of each observation is not too close to zero. It is suggested to try first with the value one. If there are numerical issues, try a larger value, such as 10. See Section 7 in the report "Estimating choice models with latent variables with PandasBiogeme" for a detailed discussion.
Yes. Actually, Biogeme does it automatically for you. If
the name of your model is mymodel
, Biogeme
creates a file named __mymodel.iter
, where
the values of the parameters are saved after each
successful iteration of the estimation algorithm. When the
estimation starts, Biogeme checks the existence of this
file. If it exists, the value stored are used as a
starting point for the estimation.
Set the attribute saveIterations
of the
BIOGEME
object to False is you want to turn
that feature off. [Click here for the documentation]
Yes.
- You first split the data into estimation and validation samples. [Click here for the documentation.]
- You can use the
validate
function [Click here for the documentation] to perform a full validation. For each slice generated by the split, the model is re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. See example04validation.py from here.
-1.797693e+308
and no iteration is performed. What should I do?If the model returns a probability 0 for the chosen
alternative for at least one observation in the sample, then
the likelihood is 0, and the log likelihood is minus
infinity. For the sake of robustness, Biogeme assigns the
value -1.797693e+308
to the log likelihood in
this context.
A common reason why the model returns a probability 0 for the chosen alternative is when it is declared unavailable. The following code allows to verify that the availability conditions are compatible with the choice variable:
diagnostic = database.checkAvailabilityOfChosenAlt(av, CHOICE) if not diagnostic.all(): row_indices = np.where(diagnostic == False)[0] print(f'Rows where the chosen alternative is not available: {row_indices}')See the documentation of
checkAvailabilityOfChosenAlt
here.
Another reason is when the initial value of a scale parameter is too close to zero. See the discussion above.
But there are many other possible reasons. The best way to investigate the source of the problem is to use Biogeme in simulation mode, and report the probability of the chosen alternative for each observation. Once you have identified the problematic entries, it is easier to investigate the reason why the model returns a probability of zero.