Tools

Generic convenient tools.

biogeme.tools module

Implements some useful functions

author: Michel Bierlaire
date: Sun Apr 14 10:46:10 2019

class biogeme.tools.LRTuple(message, statistic, threshold)

Bases: tuple

message: Alias for field number 0

statistic: Alias for field number 1

threshold: Alias for field number 2

biogeme.tools.calculatePrimeNumbers(upperBound)[source]

Calculate prime numbers

Parameters: upperBound (int) – prime numbers up to this value will be computed
Returns: array with prime numbers
Return type: list(int)
Raises: biogemeError – if the upperBound is incorrectly defined (negative number, e.g.)

>>> tools.calculatePrimeNumbers(10)
[2, 3, 5, 7]

biogeme.tools.calculate_correlation(nests, results, alternative_names=None)[source]

Calculate the correlation matrix of a nested or cross-nested logit model.

Parameters

nests (tuple(tuple(biogeme.expressions.Expression, list(int))), or tuple(tuple(biogeme.Expression, dict(int:biogeme.expressions.Expression)))) –
A tuple containing as many items as nests.

Each item is also a tuple containing two items:
- an object of type biogeme.expressions. expr.Expression representing the nest parameter,
- for the nested logit model, a list containing the list of identifiers of the alternatives belonging to the nest.
- for the cross-nested logit model, a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.
Example for the nested logit::
nesta = MUA ,[1, 2, 3] nestb = MUB ,[4, 5, 6] nests = nesta, nestb

Example for the cross-nested logit:
```
alphaA = {1: alpha1a,
          2: alpha2a,
          3: alpha3a,
          4: alpha4a,
          5: alpha5a,
          6: alpha6a}
alphaB = {1: alpha1b,
          2: alpha2b,
          3: alpha3b,
          4: alpha4b,
          5: alpha5b,
          6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb
```
results (biogeme.results.bioResults) – estimation results
alternative_names (dict(int: str)) – a dictionary mapping the alternative IDs with their name. If None, the IDs are used as names.

biogeme.tools.checkDerivatives(theFunction, x, names=None, logg=False)[source]

Verifies the analytical derivatives of a function by comparing them with finite difference approximations.

Parameters

theFunction (function) –
A function object that takes a vector as an argument, and returns a tuple:
- The first element of the tuple is the value of the function \(f\),
- the second is the gradient of the function,
- the third is the hessian.
x (numpy.array) – arguments of the function
names (list(string)) – the names of the entries of x (for reporting).
logg (bool) – if True, messages will be displayed.

Returns

tuple f, g, h, gdiff, hdiff where

f is the value of the function at x,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical gradient and the finite difference approximation
hdiff is the difference between the analytical hessian and the finite difference approximation

Return type

float, numpy.array,numpy.array, numpy.array,numpy.array

biogeme.tools.correlation_cross_nested(nests)[source]

Calculate the correlation matrix of the error terms of all alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters

nests (tuple) –

a tuple containing as many items as nests. Each item is also a tuple containing two items:

an object of type biogeme.expressions. expr.Expression representing the nest parameter,
a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.

Example:

alphaA = {1: alpha1a,
          2: alpha2a,
          3: alpha3a,
          4: alpha4a,
          5: alpha5a,
          6: alpha6a}
alphaB = {1: alpha1b,
          2: alpha2b,
          3: alpha3b,
          4: alpha4b,
          5: alpha5b,
          6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb

Returns

value of the correlation

Return type

float

Raises

biogemeError – if the requested number is non positive or a float

Returns

correlation matrix

Return type

pd.DataFrame

biogeme.tools.correlation_nested(nests)[source]

Calculate the correlation matrix of the error terms of all alternatives of a nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters

nests (tuple) –

A tuple containing as many items as nests. Each item is also a tuple containing two items:

an object of type biogeme.expressions.expr.Expression representing the nest parameter,
a list containing the list of identifiers of the alternatives belonging to the nest.

Example:

nesta = MUA ,[1, 2, 3]
nestb = MUB ,[4, 5, 6]
nests = nesta, nestb

Returns

correlation matrix

Return type

pd.DataFrame

biogeme.tools.countNumberOfGroups(df, column)[source]

This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.

Example:

>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3],
                      'value':[1000,
                               2000,
                               3000,
                               4000,
                               5000,
                               5000,
                               10000,
                               20000]})
>>>tools.countNumberOfGroups(df,'ID')
6

>>>tools.countNumberOfGroups(df,'value')
7

biogeme.tools.covariance_cross_nested(i, j, nests)[source]

Calculate the covariance between the error terms of two alternatives of a cross-nested logit model. It is assumed that the homogeneity parameter mu of the model has been normalized to one.

Parameters

i (int) – first alternative
j (int) – first alternative

nests (tuple) –

a tuple containing as many items as nests. Each item is also a tuple containing two items:

an object of type biogeme.expressions. expr.Expression representing the nest parameter,
a dictionary mapping the alternative ids with the cross-nested parameters for the corresponding nest. If an alternative is missing in the dictionary, the corresponding alpha is set to zero.

Example:

alphaA = {1: alpha1a,
          2: alpha2a,
          3: alpha3a,
          4: alpha4a,
          5: alpha5a,
          6: alpha6a}
alphaB = {1: alpha1b,
          2: alpha2b,
          3: alpha3b,
          4: alpha4b,
          5: alpha5b,
          6: alpha6b}
nesta = MUA, alphaA
nestb = MUB, alphaB
nests = nesta, nestb

Returns

value of the correlation

Return type

float

Raises

biogemeError – if the requested number is non positive or a float

biogeme.tools.findiff_H(theFunction, x)[source]

Calculates the hessian of a function \(f\) using finite differences

Parameters

theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\), and the second is the gradient of the function. The other elements are not used.
x (numpy.array) – argument of the function

Returns

numpy matrix containing the hessian calculated by finite differences.

Return type

numpy.array

biogeme.tools.findiff_g(theFunction, x)[source]

Calculates the gradient of a function \(f\) using finite differences

Parameters

theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\). The other elements are not used.
x (numpy.array) – argument of the function

Returns

numpy vector, same dimension as x, containing the gradient calculated by finite differences.

Return type

numpy.array

biogeme.tools.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]

Combine several rows of a Pandas database into one. For instance, consider the following database:

   ID  Age  Cost   Name
 1   23    34  Item3
 1   23    45  Item4
 1   23    12  Item7
 2   45    65  Item3
 2   45    34  Item7

If row_name is ‘Name’, the function generates the same data in the following format:

    Age  Item3_Cost  Item4_Cost  Item7_Cost
ID
1    23          34        45.0          12
2    45          65         NaN          34

If row_name is None, the function generates the same data in the following format:

    Age  1_Cost 1_Name  2_Cost 2_Name  3_Cost 3_Name
ID
1    23      34  Item3      45  Item4    12.0  Item7
2    45      65  Item3      34  Item7     NaN    NaN

Parameters

df (pandas.DataFrame) – initial data frame
merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’
row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.
identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.

Returns

reformatted database

Return type

pandas.DataFrame

biogeme.tools.getPrimeNumbers(n)[source]

Get a given number of prime numbers

Parameters: n (int) – number of primes that are requested
Returns: array with prime numbers
Return type: list(int)
Raises: biogemeError – if the requested number is non positive or a float

biogeme.tools.likelihood_ratio_test(model1, model2, significance_level=0.05)[source]

This function performs a likelihood ratio test between a restricted and an unrestricted model.

Parameters

model1 (tuple(float, int)) – the final loglikelihood of one model, and the number of estimated parameters.
model2 (tuple(float, int)) – the final loglikelihood of the other model, and the number of estimated parameters.
significance_level (float) – level of significance of the test. Default: 0.05

Returns

a tuple containing:

a message with the outcome of the test
the statistic, that is minus two times the difference between the loglikelihood of the two models
the threshold of the chi square distribution.

Return type

LRTuple(str, float, float)

Raises

biogemeError – if the unrestricted model has a lower log likelihood than the restricted model.