.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/tutorials/plot_b01_first_model.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_tutorials_plot_b01_first_model.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_tutorials_plot_b01_first_model.py:


Estimation of a binary logit model
==================================

Example extracted from Ben-Akiva and Lerman (1985)

Michel Bierlaire, EPFL
Thu May 16 11:59:49 2024

.. GENERATED FROM PYTHON SOURCE LINES 12-22

.. code-block:: Python


    import pandas as pd
    from IPython.core.display_functions import display

    from biogeme.biogeme import BIOGEME
    from biogeme.database import Database
    from biogeme.expressions import Beta, Variable
    from biogeme.models import loglogit
    from biogeme.results_processing import get_pandas_estimated_parameters


.. GENERATED FROM PYTHON SOURCE LINES 23-26

The data set is organized as a Pandas data frame. In this simple example, the data is provided directly in the
script. Most of the time, the data is available from a file, or an external database, and must be imported into
Pandas.

.. GENERATED FROM PYTHON SOURCE LINES 26-85

.. code-block:: Python

    data = {
        'ID': pd.Series([i + 1 for i in range(21)]),
        'auto_time': pd.Series(
            [
                52.9,
                4.1,
                4.1,
                56.2,
                51.8,
                0.2,
                27.6,
                89.9,
                41.5,
                95.0,
                99.1,
                18.5,
                82.0,
                8.6,
                22.5,
                51.4,
                81.0,
                51.0,
                62.2,
                95.1,
                41.6,
            ]
        ),
        'transit_time': pd.Series(
            [
                4.4,
                28.5,
                86.9,
                31.6,
                20.2,
                91.2,
                79.7,
                2.2,
                24.5,
                43.5,
                8.4,
                84.0,
                38.0,
                1.6,
                74.1,
                83.8,
                19.2,
                85.0,
                90.1,
                22.2,
                91.5,
            ]
        ),
        'choice': pd.Series(
            [1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0]
        ),
    }
    pandas_dataframe = pd.DataFrame(data)
    display(pandas_dataframe)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        ID  auto_time  transit_time  choice
    0    1       52.9           4.4       1
    1    2        4.1          28.5       1
    2    3        4.1          86.9       0
    3    4       56.2          31.6       1
    4    5       51.8          20.2       1
    5    6        0.2          91.2       0
    6    7       27.6          79.7       0
    7    8       89.9           2.2       1
    8    9       41.5          24.5       1
    9   10       95.0          43.5       1
    10  11       99.1           8.4       1
    11  12       18.5          84.0       0
    12  13       82.0          38.0       0
    13  14        8.6           1.6       1
    14  15       22.5          74.1       0
    15  16       51.4          83.8       0
    16  17       81.0          19.2       1
    17  18       51.0          85.0       0
    18  19       62.2          90.1       0
    19  20       95.1          22.2       1
    20  21       41.6          91.5       0


.. GENERATED FROM PYTHON SOURCE LINES 86-87

The data frame is used to initialize the Biogeme database.

.. GENERATED FROM PYTHON SOURCE LINES 87-89

.. code-block:: Python

    biogeme_database = Database('ben_akiva_lerman', pandas_dataframe)


.. GENERATED FROM PYTHON SOURCE LINES 90-96

The next step is to provide the model specification:

- the explanatory variables,
- the parameters to be estimated,
- the specification of the utility functions,
- the specification of the choice model.

.. GENERATED FROM PYTHON SOURCE LINES 98-101

Explanatory variables: the object `Variable` associates the name of a column in the database with a Python variable,
that will be used in the utility specification. In this example, we have three variables (two independent, and
one dependent, that is, the choice).

.. GENERATED FROM PYTHON SOURCE LINES 101-105

.. code-block:: Python

    auto_time = Variable('auto_time')
    transit_time = Variable('transit_time')
    choice = Variable('choice')


.. GENERATED FROM PYTHON SOURCE LINES 106-117

Parameters to be estimated: the object `Beta` identifies the parameters to be estimated. It accepts 5 arguments:

- the name of the parameter (used for reporting),
- the initial value (used as a starting point by the optimization algorithm),
- a lower bound on its value, or None if there is no such bound,
- an upper bound on its value, or None if there is no such bound,
- a status, which is either 0 or 1. If zero, it means that the value of the parameters will be estimated. If one, it
  means that the value will stay fixed to the initial value provided.

Although not formally necessary, It is good practice to use the exact same name for the Python variable and the
parameter itself.

.. GENERATED FROM PYTHON SOURCE LINES 119-121

First, we define the alternative specific constant. We estimate the one associated with the car alternative. The
one associated with transit is normalized to zero and, therefore, does not appear in the model.

.. GENERATED FROM PYTHON SOURCE LINES 121-123

.. code-block:: Python

    asc_car = Beta('asc_car', 0, None, None, 0)


.. GENERATED FROM PYTHON SOURCE LINES 124-125

Second, we define the coefficient of travel time.

.. GENERATED FROM PYTHON SOURCE LINES 125-127

.. code-block:: Python

    b_time = Beta('b_time', 0, None, None, 0)


.. GENERATED FROM PYTHON SOURCE LINES 128-129

We are now ready to specify the utility functions.

.. GENERATED FROM PYTHON SOURCE LINES 129-132

.. code-block:: Python

    utility_car = asc_car + b_time * auto_time
    utility_transit = b_time * transit_time


.. GENERATED FROM PYTHON SOURCE LINES 133-135

Next, we need to associate the utility function with the ID of the alternative. It is necessary to interpret
correctly the value of the variable `choice`. We use a Python dictionary to do that.

.. GENERATED FROM PYTHON SOURCE LINES 135-137

.. code-block:: Python

    utilities = {0: utility_car, 1: utility_transit}


.. GENERATED FROM PYTHON SOURCE LINES 138-147

To finish the specification of the model, we need to provide an expression for the contribution to the log-likelihood
function of each observation. As this is typically the logarithm of the choice probability, we need to select
a choice model. In this, we select the logit model. We use the `loglogit` model to obtain the logarithm of the
choice probability. It takes three arguments:

- a dictionary with the specification of the utility functions,
- a dictionary with the availability conditions. In this simple example, both alternatives are always available,
  so that there is no need to provide it,
- the choice variable.

.. GENERATED FROM PYTHON SOURCE LINES 147-149

.. code-block:: Python

    log_choice_probability = loglogit(utilities, None, choice)


.. GENERATED FROM PYTHON SOURCE LINES 150-152

All the ingredients are now ready. We put them together into the `BIOGEME` object. We create by proving both the
database and the model specification.

.. GENERATED FROM PYTHON SOURCE LINES 152-154

.. code-block:: Python

    biogeme_object = BIOGEME(biogeme_database, log_choice_probability)


.. GENERATED FROM PYTHON SOURCE LINES 155-158

It is recommended to provide a name to the model. Indeed, the estimation results will be saved in two files: a
"human-readable" HTML file, and a Python-specific format called `pickle` so that existing estimation results can
be read from file instead of being re-estimated.

.. GENERATED FROM PYTHON SOURCE LINES 158-160

.. code-block:: Python

    biogeme_object.model_name = 'first_model'


.. GENERATED FROM PYTHON SOURCE LINES 161-164

It is good practice to calculate the log likelihood of the null model, used as a benchmark for the general statistics.
This quantity is calculated based only on the choice set. This is why the availability of the alternatives must be
provided as an argument. In this case, both alternatives are always available, so that they are associated with 1

.. GENERATED FROM PYTHON SOURCE LINES 164-166

.. code-block:: Python

    biogeme_object.calculate_null_loglikelihood(avail={0: 1, 1: 1})


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    -14.556090791758852


.. GENERATED FROM PYTHON SOURCE LINES 167-168

Finally, we run the estimation algorithm to obtain the estimates of the coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 168-170

.. code-block:: Python

    results = biogeme_object.estimate()


.. GENERATED FROM PYTHON SOURCE LINES 171-173

The `results` object contains a great deal of information. In particular, it provides a summary of the
estimation results.

.. GENERATED FROM PYTHON SOURCE LINES 173-175

.. code-block:: Python

    print(results.short_summary())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Results for model first_model
    Nbr of parameters:              2
    Sample size:                    21
    Excluded data:                  0
    Null log likelihood:            -14.55609
    Final log likelihood:           -6.166042
    Likelihood ratio test (null):           16.7801
    Rho square (null):                      0.576
    Rho bar square (null):                  0.439
    Akaike Information Criterion:   16.33208
    Bayesian Information Criterion: 18.42113


.. GENERATED FROM PYTHON SOURCE LINES 176-177

.. code-block:: Python

    print(get_pandas_estimated_parameters(estimation_results=results))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

          Name     Value  Robust std err.  Robust t-stat.  Robust p-value
    0  asc_car -0.237573         0.805174       -0.295058        0.767950
    1   b_time -0.053110         0.021672       -2.450673        0.014259


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.593 seconds)


.. _sphx_glr_download_auto_examples_tutorials_plot_b01_first_model.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_b01_first_model.ipynb <plot_b01_first_model.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_b01_first_model.py <plot_b01_first_model.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_b01_first_model.zip <plot_b01_first_model.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_