BISON BIOGEME Model specification file

The file mymodel.mod contains the specification of the discrete choice model to be estimated. The sections of this file have to be specified as described below. Note that comments can be included using //. All characters after this command, up to the end of the current line, are ignored.

[ModelDescription]

Top

Type here any text that describes the model. It may contain several lines. Each line must be within double-quotes, like this

[ModelDescription]
"This is the first line of the model description"
"This is the second line of the model description"

Note that it will be copied verbatim in the LaTeX file. Therefore, if it contains special characters which are interpreted by LaTeX, such as $ or &, you may need to edit the LaTeX file before processing it.

[Choice]

Top

Provide here the formula to compute the identifier of the chosen alternative from the data file. Typically, a ``choice'' entry will be available directly in the file, but any formula can be used to compute it. Assume for example that you have numbered alternatives 100, 200 and 300. But in the data file, they are numbered 1,2 and 3. In this case, you must write

[Choice]
100  *  choice  
Any expression described in the section [Expressions] is valid here.

[Weight]

Top

Provide here the formula to compute the weight associated to each observation. The weight of an observation will be multiplied to the corresponding term in the log-likelihood function. Ideally, the sum of the weights should be equal to the total number of observations, although it is not required. The file reporting the statistics contains a recommendation to adjust the weights in order to comply with this convention.

Important: do not use weights in Biosim.

[Beta]

Top

Each line of this section corresponds to a parameter of the utility functions. Five entries must be provided for each parameter:

  1. Name: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space. Note that case sensitivity is enforced. Therefore varname and Varname would represent two different variables.
  2. Default value that will be used as a starting point for the estimation, or used directly for the simulation in Biosim.
  3. Lower bound on the valid values. Bounds specification is mandatory in Biogeme. If you do not want bounds, just put large negative values for lower bounds and large positive values for upper bounds. Anyway, if the bound is not active at the solution, it does not play any role, except for safeguarding the algorithm.;
  4. Upper bound on the valid values;
  5. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given default value.
Note that this section is independent of the specific model to be estimated, as it captures the deterministic part of the utility function.
      [Beta]
      // Name  Value      LowerBound  UpperBound status
         ASC1  0          -10000      10000      1
         ASC2  -0.159016  -10000      10000      0
         ASC3  -0.0869287 -10000      10000      0
         ASC4  -0.51122   -10000      10000      0
         ASC5  0.718513   -10000      10000      0
         ASC6  -1.39177   -10000      10000      0
         BETA1 0.778982   -10000      10000      0
         BETA2 0.809772   -10000      10000      0

[Mu]

Top

μ is the homogeneity parameter of the MEV model. Usually, it is constrained to be one. However, Biogeme enables to estimate it if requested (see example 10nestedBottom.mod for a nested logit model normalized from the bottom, so that μ is estimated). Four entries are specified here:

  1. Default value that will be used as a starting point for the estimation (common value: 1.0);
  2. Lower bound on the valid values (common value: 1.0e-5);
  3. Upper bound on the valid values (common value: 1.0);
  4. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.

[Utilities]

Top

Each row of this section corresponds to an alternative. Four entries are specified:

  1. The identifier of the alternative, with a numbering convention consistent with the choice definition;
  2. The name of the alternative: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space;
  3. The availability condition: this must be a direct reference to an entry in the data file, or to an expression defined in the section [Expressions];
  4. The linear-in-parameter utility function is composed of a list of terms, separated by a plus sign (+). Each term is composed of the name of a parameter and the name of an attribute, separated by a multiplication sign (*). The parameter must be listed in the section [Beta], if it is a regular parameter. If it is a random parameter, the syntax is
                nameParam [ nameParam ] 
    
    in the case of the normal distribution, or :
                nameParam { nameParam }
    
    to get a random parameter that comes from a uniform distribution. For example, in the case of the normal:
                BETA [ SIGMA ] 
    
    Note that the blank after each name parameter is required. Also, parameters BETA and SIGMA have to be listed in the section [Beta]. In the context of an independent random parameter, BETA represents the mean while SIGMA corresponds to the standard deviation. With correlated random parameters, SIGMA technically corresponds to the appropriate term in the Cholesky decomposition matrix that captures the variance-covariance structure among the random parameters. An attribute must be an entry of the data file, or an expression defined in the Section [Expressions]. In order to comply with this syntax, the Alternative Specific Constants must appear in a term like ASC * one, where one is defined in the section [Expressions}. Here is an example:
    [Utilities]
    // Id Name  Avail  linear-in-parameter expression
      1   Alt1   av1   ASC1 * one + BETA1 [SIGMA] * x11 + BETA2 * x12
      2   Alt2   av2   ASC2 * one + BETA1 [SIGMA] * x21 + BETA2 * x22
      3   Alt3   av3   ASC3 * one + BETA1 [SIGMA] * x31 + BETA2 * x32
      4   Alt4   av4   ASC4 * one + BETA1 [SIGMA] * x41 + BETA2 * x42
      5   Alt5   av5   ASC5 * one + BETA1 [SIGMA] * x51 + BETA2 * x52
      6   Alt6   av6   ASC6 * one + BETA1 [SIGMA] * x61 + BETA2 * x62
    
    If the utility function does not contain any part which is linear-in-parameters, then the keyword $NONE must be written. For example:
    [Utilities]
    // Id Name  Avail linear-in-parameter expression
      1   Alt1   av1  $NONE
    

[GeneralizedUtilities]

Top This section enables the user to add nonlinear terms to the utility function. For each alternative, the syntax is simply the identifier of the alternative, followed by the expression. For example, if the utility of alternative 1 is

β1 x11 + β2 (x12λ-1)/λ,

the syntax is
      [Utilities]
      1 Alt1 av1 BETA_1 * X11

      [GeneralizedUtilities]
      1 BETA_2 * (X21 ^ LAMBDA - 1) / LAMBDA
Another example where a non-linear part is required is when specifying a log-normal random coefficient.

[ParameterCovariances]

Top Biogeme allows normally distributed random parameters to be correlated, and can estimate their covariance. By default, the variance-covariance matrix of the random parameters is supposed to be diagonal, and no covariance is estimated. If some covariances must be estimated, each pair of correlated random coefficients must be identified in this section. Each entry of the section should contain:
  1. The name of the first random parameter in the given pair. If it appears in the utility function as BETA [ SIGMA ], its name must be typed BETA_SIGMA.
  2. The name of the second random parameter involved in the pair, using the same naming convention.
  3. The default value that will be used as a starting point for the estimation;
  4. The lower bound on the valid values;
  5. The upper bound on the valid values;
  6. The status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.
If no covariance is to be estimated, you must either entirely remove the section, or specify $NONE as follows:
      [ParameterCovariances]
      $NONE

[Draws]

Top Number of draws to be used in Maximum Simulated Likelihood estimation.

[Expressions]

Top In this section are defined all expressions appearing either in the availability conditions or in the utility functions of the alternatives defined in the section [Utilities]. If the expression is readily available from the data file, it can be omitted in the list. It is good practice to generate new variables from this section especially when one objective is to compute market shares or to evaluate effects of policies with the help of Biosim.

We now summarize the syntax that can be used for generating new variables. Variables which form an expression might be of type float or of type integer. You can use numerical values or the name of a numerical variable. New variables can be created using unary and binary expression operators.

Unary expressions
  • y = sqrt(x) // y is square root of x.
  • y = log(x) // y is natural log of x.
  • y = exp(x) // y is exponential of x.
  • y = abs(x) // y is absolute value of x.
Numerical binary expressions
  • y = x + z // y is sum of variables x and z
  • y = x - z // y is difference of variables x and z
  • y = x * z // y is product of variables x by z
  • y = x / z // y is division of variable x by z
  • y = x ^ z // y is x to power of z (square would be y = x ^ 2)
  • y = x % z // y is x modulo z, i.e. rest of x/z
Logical binary expressions
  • y = x == z // y is 1 if x equals z, 0 otherwise
  • y = x != z // y is 1 if x not equal to z, 0 otherwise
  • y = x || z // y is 1 if x != 0 OR z != 0, 0 otherwise
  • y = x && z // y is 1 if x != 0 AND z != 0, 0 otherwise
  • y = x < z // y is 1 if x < z (note: also > )
  • y = x <= z // y is 1 if x <= z (note: also >= )
  • y = max(x,z) // y is max of x and z (note: also min)

Note that an expression is considered to be TRUE if it is non zero, and FALSE if it is zero. For a full description of these expressions and alternative syntax, please look at the files patSpecParser.y and patSpecScanner.l in the Biogeme distribution.

Loops can be defined if several expressions have almost the same syntax. The idea is to replace all occurrences of a string, say xx, by numbers. The numbers are generated within a loop, defined by 3 numbers: the start of the loop (a), the end of the loop (b) and the step (c) with the following syntax:

         $LOOP {xx a b c}
The expression
      $LOOP {xx 1 5 2} my_expression_xx = other_expression_xx * term_xx_first
is equivalent to
      my_expression_1 = other_expression_1 * term_1_first
      my_expression_3 = other_expression_3 * term_3_first
      my_expression_5 = other_expression_5 * term_5_first
Warning: make sure that the string is awkward enough so that it cannot match any other instance by mistake. For example, the loop
      {xp 1 5 2} my_expression_xp = other_expression_xp * term_xp_first
is equivalent to
      my_e1ression_1 = other_e1ression_1 * term_1_first
      my_e3ression_3 = other_e3ression_3 * term_3_first
      my_e5ression_5 = other_e5ression_5 * term_5_first
which is probably not the desired effect.

[Group]

Top

Provide here the formula to compute the group ID of the observed individual. Typically, a ``group'' entry will be available directly from the data file, but any formula can be used to compute it. Any expression described in the section [Expressions] is valid here. A different scale parameter will be estimated for the utility of each group.

[Exclude]

Top

Define an expression (see section [Expressions]) which identifies entries of the data file to be excluded. If the result of the expression is not zero, the entry will be discarded.

[Model]

Top

Specifies which model is to be used. Valid entries are

  • $BP for the binary probit model,
  • $MNL for the logit model,
  • $NL for the single level nested logit model,
  • $CNL for the cross-nested logit model,
  • $NGEV for the network GEV,
  • $OL for the ordered logit model.

[PanelData]

Top Used to specify the name of the variable (ex: userID) in the dataset identifying the observations belonging to a given individual and to specify the name of the random parameters that are invariant within the observation of a given individual userID.

[Scale]

Top A scale parameter is associated with each group. The utility function of each member of a group is multiplied by the associated scale parameter. A typical application is the joined estimation of revealed and stated preferences. It is therefore possible to estimate a logit model combining both data sources, without playing around with dummy nested structures. Each row of this section corresponds to a group. Five entries are required per row:
  1. Group number: the numbering must be consistent with the group definition;
  2. Default value that will be used as a starting point for the estimation (1.0 is a good guess);
  3. Lower bound on the valid values (should be strictly positive);
  4. Upper bound on the valid values;
  5. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.
Clearly, one of the groups must have a fixed scale parameter.

[SelectionBias]

Top Identifies the parameters capturing the selection bias, using the estimator proposed by Bierlaire, Bolduc and McFadden (2008). Each of them has to be listed in the section [Beta]. The section must contain a row per alternative for which a selection bias has to be estimated. Each row contains the number of the alternative and the name of the associated parameter. Note that these parameters play a similar role as the alternative specific constants, and must not be used with a logit model.
[SelectionBias]
1 SB_1
4 SB_4
6 SB_6

[NLNests]

Top This section is relevant only if the $NL option has been selected in the section [Model]. If the model to estimate is not a nested logit model, the section will be simply ignored. Note that multilevel nested logit models must be modeled as network GEV models. Each row of this section corresponds to a nest. Six entries are required per row:
  1. Nest name: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space;
  2. Default value of the nest parameter μm that will be used as a starting point for the estimation (1.0 is a good guess);
  3. Lower bound on the valid values. It is usually 1.0, if μ is constrained to be 1.0. Do not forget that, for each nest i, the condition μi ≥ μ must be verified to be consistent with discrete choice theory;
  4. Upper bound on the valid values;
  5. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.
  6. The list of alternatives belonging to the nest, numbered as specified in the section [Utilities]. Make sure that each alternative belongs to exactly one nest, as no automatic verification is implemented in Biogeme.

[CNLNests]

Top This section is relevant only if the $CNL option has been selected in the section [Model]. If the model to estimate is not a cross-nested logit model, the section will be simply ignored. Note that multilevel cross-nested logit models must be modeled as network GEV models. Each row of this section corresponds to a nest. Five entries are required per row:
  1. Nest name: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space;
  2. Default value of the nest parameter μm that will be used as a starting point for the estimation;
  3. Lower bound on the valid values. It is usually 1.0, if μ is constrained to be 1.0. Do not forget that, for each nest i, the condition μi ≥ μ must be verified to be consistent with discrete choice theory;
  4. Upper bound on the valid values;
  5. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.

[CNLAlpha]

Top This section is relevant only if the $CNL option has been selected in the section [Model]. If the model to estimate is not a cross-nested logit model, the section will be simply ignored. Each row of this section corresponds to a combination of a nest and an alternative. Six entries are required per row:
  1. Alternative name, as defined in the section [Utilities];
  2. Nest name: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space;
  3. Default value of the parameter capturing the level at which an alternative belongs to a nest that will be used as a starting point for the estimation;
  4. Lower bound on the valid values (usually 0.0);
  5. Upper bound on the valid values (usually 1.0);
  6. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.

[Ratios]

Top It is sometimes useful to read the ratio of two estimated coefficients. The most typical case is the value-of-time, being the ratio of the time coefficient and the cost coefficient. This feature is only implemented for fixed parameters. Computation of ratio of random parameters is not permitted in this version. Note that it is not straightforward to characterize the distribution of the ratio of two random coefficients. Ben-Akiva, Bolduc and Bradley (1993> suggest a simple approach that is directly implementable in Biogeme to handle ratio of random parameters. Each row in this section enables to specify such ratios to be produced in the output file. Three entries are required:
  1. The parameter (from the section [Beta]) being the numerator of the ratio;
  2. The parameter (from the section [Beta]) being the denominator of the ratio;
  3. The name of the ratio, to appear in the output file: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space.

[ConstraintNestCoef]

Top It is possible to constrain nests parameters to be equal. This is achieved by adding to this section expressions like
         NEST_A = NEST_B
where NEST_A and NEST_B are names of nests defined in the section [NLNests], the section [CNLNests] or the section [NetworkGEVNodes]. This section will become obsolete in future releases, as there is now a section for linear constraints on the parameters: section [LinearConstraints].

[NetworkGEVNodes]

Top This section is relevant only if the $NGEV option has been selected in the section [Model]. If the model to estimate is not a Network GEV model, the section will be simply ignored. Each row of this section corresponds to a node of the network GEV model. All nodes of the network GEV model except the root and the alternatives must be listed here, with their associated parameter. Five entries are required per row:
  1. Node name: the first character must be a letter (any case) or an underscore (_), followed by a sequence of letters, digits, underscore (_) or dashes (-), and terminated by a white space;
  2. Default value of the node parameter μj that will be used as a starting point for the estimation;
  3. Lower bound on the valid values. It is usually 1.0. Check the condition on the parameters for the model to be consistent with the theory in Bierlaire, 2002;
  4. Upper bound on the valid values;
  5. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.

[NetworkGEVLinks]

Top This section is relevant only if the $NGEV option has been selected in the section [Model]. If the model to estimate is not a Network GEV model, the section will be simply ignored. Each row of this section corresponds to a link of the network GEV model, starting from the a-node to the b-node. The root node is denoted by __ROOT. All other nodes must be either an alternative or a node listed in the section [NetworkGEVNodes]. Note that an alternative cannot be the a-node of any link, and the root node cannot be the b-node of any link. Six entries are required per row:
  1. Name of the a-node: it must be either __ROOT or a node listed in the section [NetworkGEVNodes].
  2. Name of the b-node: it must be either a node listed in the section [NetworkGEVNodes], or the name of an alternative.
  3. Default value of the link parameter that will be used as a starting point for the estimation;
  4. Lower bound on the valid values.
  5. Upper bound on the valid values;
  6. Status, which is 0 if the parameter must be estimated, or 1 if the parameter has to be maintained at the given value.

[LinearConstraints]

Top In this section, the user can define a list of linear constraints, in one of the following syntax:
  1. Formula = number,
  2. Formula ≤ number,
  3. Formula ≥ number.
The syntax is formally defined as follows:
      oneConstraint : equation <= numberParam | 
                  equation = numberParam | 
                  equation >= numberParam  
      equation: eqTerm |  
              - eqTerm | 
              equation + eqTerm  | 
              equation - eqTerm 
      eqTerm: parameter | numberParam * parameter 
For example, the constraint

Σi ASCi = 0.0

is written
      ASC1 + ASC2 + ASC3 + ASC4 + ASC5 + ASC6 = 0.0
and the constraint

μ ≤ μj

is written
      MU - MUJ <= 0.0
or
      MUJ - MU >= 0.0

[NonLinearEqualityConstraints]

Top In this section, the user can define a list of nonlinear equality constraints of the form

h(x) = 0.0.

The section must contain a list of functions h(x). For example, the constraint

αμaa1 + αμbb1 = 1

is written
      [NonLinearEqualityConstraints]
      ALPHA_A1 ^ MU_A  + ALPHA_B1 ^ MU_B - 1.0

[NonLinearInequalityConstraints]

Top Biogeme is not able to handle nonlinear inequality constraints yet. It should be available in a future version.

[DiscreteDistributions]

Top Provide here the list of random parameters with a discrete distribution, or $NONE if there are none in the model. Each discrete parameter is described using the following syntax:
nameDiscreteParam < listOfDiscreteTerms >
where nameDiscreteParam is the name of the random parameter, and listOfDiscreteTerms is recursively defined as
oneDiscreteTerm |
listOfDiscreteTerms oneDiscreteTerm
where oneDiscreteTerm is defined as
nameValueParam ( nameProbaParam )
where nameValueParam is the name of the parameter capturing the discrete value of the random parameter, and nameProbaParam is the name of the parameter capturing the associated probability. Both must be defined in the section [Beta]. As an example,
[DiscreteDistributions]
BETA1 < B1 ( W1 ) B2 ( W2 ) >
defines a random parameter BETA1, which takes the value B1 with probability (or weight) W1, and the value B2 with probability W2. Note that for this to make sense, the constraint W1 + W2 = 1.0 should be imposed (Section [LinearConstraints]). Note also that the parameter BETA1 must not appear in the section [Beta].

[AggregateLast]

Top Boolean which, for each row in the sample file, identifies if it is the last observation in an aggregate. Make sure that the value for the last row is nonzero. As all booleans in Biogeme, a numerical value of 0 means ``FALSE'' and a numerical value different from 0 means ``TRUE''. Any expression described in the section [Expressions] is valid here.

[AggregateWeight]

Top Associates a weight to elemental observations of an aggregate. Any expression described in the section [Expressions] is valid here.

[LaTeX]

Top This section allows to define a description of each parameter to be used in the LaTeX file. Here is an example:
[LaTeX]
ASC1   "Constant for alt. 1"
ASC2   "Constant for alt. 2"
ASC3   "Constant for alt. 3"
ASC4   "Constant for alt. 4"
ASC5   "Constant for alt. 5"
ASC6   "Constant for alt. 6"
BETA1  "$\beta_1$"
BETA2  "$\beta_2$"

[Derivatives]

Top This section is for advanced users only. Use it at your own risk. When nonlinear utility functions are used, Biogeme computes automatically the derivatives needed by the maximum likelihood procedure. However, this automatic derivation can significantly slow down the estimation process, as no simplification is performed. This section allows the user to provide Biogeme with the analytical derivatives of the utility function, in order to speed up the estimation process. In some instances, half the estimation time was spared thanks to this feature. A row must be provided for each combination of nonlinear utilities (defined in the section [GeneralizedUtilities]) and parameters involved in the formula. Each of these rows contains three items:
  • the identifier of the alternative,
  • the name of the parameter,
  • the formula of the derivative.
For instance, assume that the systematic utility of alternative 1 is is coded in Biogeme as follows:
[Utilities]
// Id Name  Avail  linear-in-parameter expression (beta1*x1 + beta2*x2 + ... )
  1   Alt1   av1   ASC1 * one 
  .
  .
[GeneralizedUtilities]
1  BETA1 * ((x11 + 10 ) ^ LAMBDA11 - 1) / LAMBDA11 + 
   BETA2 * ((x12 + 10 ) ^ LAMBDA12 - 1) / LAMBDA12
Then, the [Derivatives] section can be coded as follows:
[Derivatives]
1 BETA1 ((x11 + 10 ) ^ LAMBDA11 - 1) / LAMBDA11
1 BETA2 ((x12 + 10 ) ^ LAMBDA12 - 1) / LAMBDA12
1 LAMBDA11 
      BETA1 * ((x11 + 10) ^ LAMBDA11 * LN(x11 + 10) * LAMBDA11 
             - (x11 + 10) ^ LAMBDA11 + 1) / (LAMBDA11 * LAMBDA11 )
1 LAMBDA12 
      BETA2 * ((x12 + 10) ^ LAMBDA12 * LN(x12 + 10) * LAMBDA12 
             - (x12 + 10) ^ LAMBDA12 + 1) / (LAMBDA12 * LAMBDA12 )
In addition to usual expressions, the formula may contain the following instruction:
$DERIV( formula , param )
which means that you ask Biogeme to perform the derivation of the formula for you. Although it may be useful to simplify the coding of the derivatives, it is mandatory to use it for random parameters. If BETA [ SIGMA ] is a random parameter, its derivative with respect to BETA is 1, but its derivative with respect to SIGMA cannot be written by the user, and must be coded
$DERIV( BETA [ SIGMA ] , SIGMA )
For instance, assume that the nonlinear utilities are defined as
1 exp( BETA1 [ SIGMA1 ] ) * x11
2 exp( BETA1 [ SIGMA1 ] ) * x21
The derivatives are coded as follows:
[Derivatives]
1 BETA1    exp( BETA1 [ SIGMA1 ] ) * x11
1 SIGMA1   exp( BETA1 [ SIGMA1 ] ) * x11 
              * $DERIV( BETA1 [ SIGMA1 ] , SIGMA1 )
2 BETA1    exp( BETA1 [ SIGMA1 ] ) * x21
2 SIGMA1   exp( BETA1 [ SIGMA1 ] ) * x21 
              * $DERIV( BETA1 [ SIGMA1 ] , SIGMA1 )
It is very easy to do an error in coding the analytical derivatives. If there is an error, Biogeme will not be able to estimate the parameters, and will not even be able to detect that there is an error. Therefore, we strongly suggest to set the parameter gevCheckDerivatives to 1 and make sure that the numerical derivatives match sufficiently well the analytical derivatives. Also, estimate the model with few observations and few draws, once with and once without this section. The results should be exactly the same.

[SNP]

Top This section allows to implement the test proposed by Fosgerau and Bierlaire, 2007 (read the paper first if you are not familiar with the test). The section is composed of two things:
  1. The name of the random parameter to be tested. If this parameter appears in the utility function as BETA [ SIGMA ], its name in this section must be typed BETA_SIGMA.
  2. A list of positive integers associated with a parameter. The integer is the degree of the Legendre polynomial, and the parameter the associated coefficient in the development. Note that the name of the parameter must appear in the section [Beta].
  3. For instance, if the distribution of the parameter BETA [ SIGMA ] is tested using a seminonparametric development defined by

    1 + δ1 L1(x) + δ3 L3(x) + δ4 L4(x),

    the syntax in Biogeme is
    [Beta]
    // Name  Value LowerBound UpperBound  status (0=variable, 1=fixed)
    ....
       BETA  0     -10000     10000       0
       SIGMA 1     -10000     10000       0  
       SMP1  0     -10000     10000       0
       SMP3  0     -10000     10000       0
       SMP4  0     -10000     10000       0
    
    [SNP]
    // Define the coefficients of the series 
    // generated by the Legendre polynomials
    BETA_SIGMA
    1 SMP1
    3 SMP3
    4 SMP4
    
    Note that only one random parameter can be transformed at a time.

[OrdinalLogit]

Top An ordinal binary choice model is derived when ordinal responses are available, where the respondent not only reports the preference, but also the strength of the preference. For instance, if alternatives i and j are available, the respondent can report one of the following.
  • definitely choose j;
  • probably choose j;
  • indifferent;
  • probably choose i;
  • definitely choose i.
As for the binary choice model, the selected category is explained by the difference Uin-Ujn between the utilities of the two alternatives, as depicted. But, here, more than two responses are possible. Formally, we consider Q ≥ 2 categories, ordered such that category q corresponds to a stronger preference towards alternative i compared to category q-1, for q=1,...,Q. We define Q+1 parameters τq, q=0,...,Q, such that τ0 = -∞, τQ=+∞, and τq-1 ≤ τq, q=1,...,Q. A category q is associated with the interval [τq-1q]. The probability for category q to be selected by the respondent is

Pn(q)= P(τq-1 ≤ Uin - Ujn ≤ τq)
= P(τq-1 ≤ (Vin - Vjn) - (εjn - εin) ≤ τq)
= P(Vin - Vjn - τq ≤ εn ≤ Vin - Vjnq-1)
= Fεn(Vin - Vjnq-1) - Fεn(Vin - Vjn - τq)
where εnjn - εin, and Fεn is the CDF of εn. In particular, if εn is logistically distributed, we obtain the ordinal logit model. We immediately note that binary choice models are specific instances of ordinal binary choice models, with two categories (Q=2), and τ1=0. The parameters τ of ordinal binary logit models can be estimated using this section of the model specification file. The segments of the utility difference space must be numbered in a sequential way, increasing from the leftmost to the rightmost. In this section, each segment must be associated with its lower bound, except the first (because its lower bound is -∞). For instance, if there are 4 segments, the following syntax is used:
[Beta]
....
tau1 0.3 -1000 1000 1
tau2 0.4 -1000 1000 0 
tau3 0.5 -1000 1000 0 

[OrdinalLogit]
1 $NONE    //  -infty --> tau1 
2 tau1     //  tau1   --> tau2
3 tau2     //  tau2   --> tau3
4 tau3     //  tau3   --> +infty

[LinearConstraints]
tau1 - tau2 <= 0
tau2 - tau3 <= 0
Note that the constraints impose that the segments are well-defined. Recall also that the characters // represent a comment in the file and they are not interpreted by Biogeme , as well as all remaining characters on the same line. Therefore, the following syntax for that section is completely equivalent:
[OrdinalLogit]
1 $NONE
2 tau1 
3 tau2 
4 tau3 
However, we strongly advise to use comments in order to clearly identify the segments. The numbering of the alternatives in an ordinal logit model specification is required in order to comply with the Biogeme syntax, but is used in a completely different way as for choice models. Exactly two alternatives must be specified in the section [Utilities]. It is advised to number them 1 and 2, and to call them Alt1 and Alt2. This numbering is used only to compute the utility difference U1-U2. To avoid any ambiguity about the sign of the difference (U1-U2 or U2-U1), the formula actually used is reported in the output file. The value of the variable defined in the section [Choice] identifies the category actually selected by the respondent.

[SampleEnum]

Top This section is ignored by Biogeme. It is used by Biosim and contains the number of simulations to perform in the sample enumeration step.

[ZhengFosgerau]

Top This section is ignored by Biogeme. It is used by Biosim and contains instructions to perform the Zheng-Fosgerau specification test and residual analysis. Make sure to read the paper by Fosgerau, 2008 before using this section. There is a line for each test, containing four items:
  1. The first item defines the function t introduced by Fosgerau, 2008 to reduce the dimensionality of the test. It is typically either the probability of an alternative, or an expression involving coefficients and attributes of the models, as soon as the expression is continuous and not discrete. If it is a probability, the syntax is
    $P { AltName }
    
    where AltName is the name of the alternative as defined in the section [Utilities]. If it is a general expression, the syntax is
    $E { expr }
    
    where expr is an expression complying with the syntax of the section [Expressions]. However, it may also contain estimated parameters.
  2. The second item is a parameter c used to define the bandwidth for the nonparametric regression performed by the test (see end of Section 2.1 in Fosgerau, 2008). The bandwidth used by Biosim is defined as c/√n, where n is the sample size. Most users will use the value c=1.
  3. The third and the fourth item are lower and upper bounds (resp.) Values of t outside of the bounds will not be used in the produced pictures. It is good practice to use wide bounds first, and to adjust them in order to obtain decent pictures. Note that if t is a probability, it does not make sense to have bounds wider and [0:1].
  4. The last item is the name of the function t, used in the report. Make sure to put the name between double-quotes.
Here is an example of the syntax:
[ZhengFosgerau]
$P { Alt1 } 1 0 1 "P1"
$E { x31 } 1 -1000 1000 "x31"  

[IIATest]

Top This section is ignored by Biogeme. It is used by Biosim to compute the variables necessary to perform the McFadden omitted variables test on a subset of alternatives .

Suppose that we have estimated a logit model, using all the observations. Denote by Pin the probability given by this model that individual n in the sample chooses alternative i.

Consider C ⊆ C a given subset of alternatives. Define the new variables

zin = Vin - (Σj ∈ C Pjn Vjnj ∈ C Pjn) if i ∈ C, 0 otherwise.

Estimate the same model as before where the new variables have been also included in the specification. Testing if IIA holds is equivalent to testing if all the coefficients of the new variables are 0, which can be performed with a likelihood ratio test.

Biosim can compute these variables. The syntax is illustrated by the following example.
[IIATest]
// Description of the choice subsets to compute the new 
// variable for McFadden's IIA test
// Name list_of_alt
C123 1 2 3
C345 3 4 5
Each row corresponds to a new variable. It consists in the name of the variable (it will appear as the column header in the output of Biosim), followed by the list of alternatives to be included in the associated subset C.

Back
Biogeme