Embo: a Python package for empirical data analysis using the Information Bottleneck

Journal of Open Research Software

Volume 9 (2021): Issue 1

By: Eugenio Piasini , Alexandre L. S. Filipowicz , Jonathan Levine and Joshua I. Gold

Open Access

|May 2021

Figures & Tables

From embo’s documentation (`examples/Basic-example.ipynb`): Top, red: IB curves for two simple synthetic datasets, one where both X and Y are binary (left column, “Two symbols”) and one where they can both take on 4 possible states (right column, “Four symbols”). Each dot represents the solution of Equation (1) for a particular value of β (solid lines connecting the dots are added for legibility). Gray: identity line. Bottom: values of *I(M : Y)* and *I(M : X)* vs their corresponding values of β. See the software documentation for further detail on how these figures were generated. Note that the IB curve is always below the identity line and that the values of *I(M : Y)* and *I(M : X)* are never larger than the base 2 logarithm of the number of states (1 bit and 2 bits, respectively, corresponding to 2 and 4 states, respectively). These are conditions that the IB curve should always satisfy [1] and can be taken as sanity checks for embo’s correct operation.

From the documentation (`examples/Deterministic-Bottleneck.ipynb`): comparison of IB and DIB, similarly to Figure 2 in [12]. In this example, X can take on one out of 128 possible states, Y can take on one out of 32 states, and *p(x)* is close to uniform (see the notebook for details about the joint *p(x, y)*). Left: IB and DIB solutions for a range of β values, visualized in the “IB plane” where *I(M : Y)* is plotted against *I(M : X)*. Right: same solutions as in the left panel, visualized in the “DIB plane” where *I(M : Y)* is plotted against *H(M)*. As expected from [12], in the IB plane the two methods behave similarly. In the DIB plane, however, the DIB performs better than the IB in the sense that *H(M)* is much lower for the DIB than for the IB, for any given value of *I(M : Y)*.

From embo’s documentation (`examples/Compare-embo-dit.ipynb`): comparison of embo and dit [14] on sample IB problems of different dimensionality, defined as the number of possible states for the joint random variable *(X, Y)*. The problem with dimensionality 9 (where both X and Y have three possible states) is taken from the documentation of the current version of dit. Left: runtime vs dimensionality. Dit/sp and dit/ba indicate the algorithm used by dit: sp for `scipy.optimize` and ba for the Blahut-Arimoto algorithm. It was not possible to run dit on the smallest problem due to a software bug. Center: IB bound for the problem with dimensionality 9, computed with embo and dit. Embo and dit/sp (blue and orange) find the same solution, while dit/ba (green) finds a suboptimal one. Right: *I (M : X)* and *I (M : Y)* as a function of β. Note how dit/ba (green) becomes unstable at large β. See notebook for more details.

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/jors.322 | Journal eISSN: 2049-9647

Journal RSS Feed

Language: English

Submitted on: Feb 4, 2020

Accepted on: May 13, 2021

Published on: May 31, 2021

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Information theory,

Python,

Information Bottleneck,

Deterministic Information Bottleneck,

data analysis,

statistics

© 2021 Eugenio Piasini, Alexandre L. S. Filipowicz, Jonathan Levine, Joshua I. Gold, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 9 (2021): Issue 1

Embo: a Python package for empirical data analysis using the Information Bottleneck

Figures & Tables

Figure 1

Figure 2

Figure 3

Paradigm

My account