Tools

Generic convenient tools.

biogeme.tools module

Implements some useful functions

author:: Michel Bierlaire
date:: Sun Apr 14 10:46:10 2019

class biogeme.tools.LRTuple(message, statistic, threshold)

Bases: tuple

message: Alias for field number 0

statistic: Alias for field number 1

threshold: Alias for field number 2

class biogeme.tools.ModelNames(prefix='Model')[source]

Bases: object

Class generating model names from unique configuration string

__init__(prefix='Model')[source]

class biogeme.tools.TemporaryFile[source]

Bases: object

Class generating a temporary file, so that the user does not bother about its location, or even its name

Example:

with TemporaryFile() as filename:
    with open(filename, 'w') as f:
        print('stuff', file=f)

biogeme.tools.calculatePrimeNumbers(upperBound)[source]

Calculate prime numbers

Parameters:: upperBound (int) – prime numbers up to this value will be computed
Returns:: array with prime numbers
Return type:: list(int)
Raises:: BiogemeError – if the upperBound is incorrectly defined (negative number, e.g.)

>>> tools.calculatePrimeNumbers(10)
[2, 3, 5, 7]

biogeme.tools.checkDerivatives(theFunction, x, names=None, logg=False)[source]

Verifies the analytical derivatives of a function by comparing them with finite difference approximations.

Parameters:

theFunction (function) –
A function object that takes a vector as an argument, and returns a tuple:
- The first element of the tuple is the value of the function \(f\),
- the second is the gradient of the function,
- the third is the hessian.
x (numpy.array) – arguments of the function
names (list(string)) – the names of the entries of x (for reporting).
logg (bool) – if True, messages will be displayed.

Returns:

tuple f, g, h, gdiff, hdiff where

f is the value of the function at x,
g is the analytical gradient,
h is the analytical hessian,
gdiff is the difference between the analytical gradient and the finite difference approximation
hdiff is the difference between the analytical hessian and the finite difference approximation

Return type:

float, numpy.array,numpy.array, numpy.array,numpy.array

biogeme.tools.countNumberOfGroups(df, column)[source]

This function counts the number of groups of same value in a column. For instance: 1,2,2,3,3,3,4,1,1 would give 5.

Example:

>>>df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3],
                      'value':[1000,
                               2000,
                               3000,
                               4000,
                               5000,
                               5000,
                               10000,
                               20000]})
>>>tools.countNumberOfGroups(df,'ID')
6

>>>tools.countNumberOfGroups(df,'value')
7

biogeme.tools.findiff_H(theFunction, x)[source]

Calculates the hessian of a function \(f\) using finite differences

Parameters:

theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\), and the second is the gradient of the function. The other elements are not used.
x (numpy.array) – argument of the function

Returns:

numpy matrix containing the hessian calculated by finite differences.

Return type:

numpy.array

biogeme.tools.findiff_g(theFunction, x)[source]

Calculates the gradient of a function \(f\) using finite differences

Parameters:

theFunction (function) – A function object that takes a vector as an argument, and returns a tuple. The first element of the tuple is the value of the function \(f\). The other elements are not used.
x (numpy.array) – argument of the function

Returns:

numpy vector, same dimension as x, containing the gradient calculated by finite differences.

Return type:

numpy.array

biogeme.tools.flatten_database(df, merge_id, row_name=None, identical_columns=None)[source]

Combine several rows of a Pandas database into one. For instance, consider the following database:

   ID  Age  Cost   Name
 1   23    34  Item3
 1   23    45  Item4
 1   23    12  Item7
 2   45    65  Item3
 2   45    34  Item7

If row_name is ‘Name’, the function generates the same data in the following format:

    Age  Item3_Cost  Item4_Cost  Item7_Cost
ID
1    23          34        45.0          12
2    45          65         NaN          34

If row_name is None, the function generates the same data in the following format:

    Age  1_Cost 1_Name  2_Cost 2_Name  3_Cost 3_Name
ID
1    23      34  Item3      45  Item4    12.0  Item7
2    45      65  Item3      34  Item7     NaN    NaN

Parameters:

df (pandas.DataFrame) – initial data frame
merge_id (str) – name of the column that identifies rows that should be merged. In the above example: ‘ID’
row_name (str) – name of the columns that provides the name of the rows in the new dataframe. In the example above: ‘Name’. If None, the rows are numbered sequentially.
identical_columns (list(str)) – name of the columns that contain identical values across the rows of a group. In the example above: [‘Age’]. If None, these columns are automatically detected. On large database, there may be a performance issue.

Returns:

reformatted database

Return type:

pandas.DataFrame

biogeme.tools.generate_unique_ids(list_of_ids)[source]

If there are duplicates in the list, a new list is generated where there are renamed to obtain a list with unique IDs.

Parameters:: list_of_ids (list[str]) – list of ids
Returns:: a dict that maps the uniue names with the original name

biogeme.tools.getPrimeNumbers(n)[source]

Get a given number of prime numbers

Parameters:: n (int) – number of primes that are requested
Returns:: array with prime numbers
Return type:: list(int)
Raises:: BiogemeError – if the requested number is non positive or a float

biogeme.tools.likelihood_ratio_test(model1, model2, significance_level=0.05)[source]

This function performs a likelihood ratio test between a restricted and an unrestricted model.

Parameters:

model1 (tuple(float, int)) – the final loglikelihood of one model, and the number of estimated parameters.
model2 (tuple(float, int)) – the final loglikelihood of the other model, and the number of estimated parameters.
significance_level (float) – level of significance of the test. Default: 0.05

Returns:

a tuple containing:

a message with the outcome of the test
the statistic, that is minus two times the difference between the loglikelihood of the two models
the threshold of the chi square distribution.

Return type:

LRTuple(str, float, float)

Raises:

BiogemeError – if the unrestricted model has a lower log likelihood than the restricted model.