knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(rinform)
An R wrapper for the Inform v1.0.0 C library developed by Douglas G. Moore for performing information analysis of complex system. As for the Inform library, rinform is structured around the concepts of:
In addition to the core components, rinform also provides a small collection of utilities to deal with time series.
If you are using rinform, consider citing the following articles:
Acknowledgement: This project was supported in part by the grant Emergent computation in collective decision making by the crevicedwelling rock ant Temnothorax rugatulus provided by the National Science Foundation (NSF grant PHY1505048).
The rinform package includes some C code, that is, the sources of the Inform library. You may need some extra tools to install rinform as they are required to compile the Inform source (e.g., Xcode for Mac users, Rtools for Windows users)
To install rinform directly from the CRAN repository you will simply need to type:
install.packages("rinform")
Once installed, you can load rinform prior to use it with:
library(rinform)
To install rinform from its GitHub repository you will need to have installed the devtools package. If you don't have devtools already, you can install it with:
install.packages("devtools")
Load devtools and install the latest stable version of rinform (i.e., master branch):
library(devtools) install_github("ELIFEASU/rinform")
In case you need to use the development version of rinform, you can specify to install from the dev branch:
install_github("ELIFEASU/rinform", ref = "dev")
Once installed, you can load rinform prior to use it with:
library(rinform)
rinform, as its parent library inform, is developed to help anyone interested in applying informationtheoretic techniques get things done quickly and painlessly. We cannot do that without your feedback. We host the project's source code and issue tracker on Github. Please create an issue if you find a bug, an error in this documentation, or have a feature you'd like to request. Your contribution will make rinform a better tool for everyone.
If you are interested in contributing to rinform, please contact the developers, and we'll get you up and running!
Resources: Source Repository and Issue Tracker
While rinform is a great tool to use directly in R, significant effort has gone into making it easy to access the functionality of the inform library directly using other higher level languages. Here are a few of the wrappers that are under active development:
Inform and its collection of wrappers such as rinform were not created in an intellectual vacuum. Many similar projects have preceded it and are under active development. All of those projects have advantages and disadvantages compared to Inform. If Inform doesn't meet your needs, I hope you will let us know, and try one of these:
Copyright © 20172018 Gabriele Valentini and Douglas G. Moore, Arizona State University. Free use of this software is granted under the terms of the MIT License. See the LICENSE file for details.
The Dist
class provides an empirical distribution,
i.e. a histogram, representing the observed frequencies of some fixedsized set of events.
This class is the basis for all of the fundamental information measures provided on
discrete probability distributions.
The purpose of the Dist
class is to make writing custom
implementations of informationtheoretic measures possible. Most of the measures implemented
in rinform use this structure at their core.
+++
 Type  Functions 
+====================+==========================================================+
 Class Dist

+++
 Allocation  resize
, copy
, infer
, approximate
, uniform

+++
 Accessors/Mutators  length
, counts
, valid
, get_item
, set_item
, tick
, accumulate

+++
 Probabilities probability
, dump

+++
This class is the basis for almost all of the calculations performed via the library. It provides a means of managing observations and converting the observed event frequencies to probabilities.
The distribution is, roughly, a histogram with finite support. The events are assumed to be
mapped to the dense set of integers ${0,1,\ldots,N1}$ where $N$ is the number
of observable events. The number of observable events can be extracted with
length
.
Whenever the size of the support or the number of observed events is zero, the distribution is considered invalid meaning that you can't trust any probabilities extracted from it.
Interface:
A distribution of observed event frequencies. If the parameter n
is an integer,
the distribution is constructed with a zeroed support of size $n$. If n
is a vector
of integer values, the sequence is treated as the underlying support. On the other hand,
if n
is a vector of floating point values, it is treated as a probability distribution
and must sum to unity. Note, if a probability distribution is given as the underlying
support, it will first be converted to a histogram with a precision of 9 significant figures.
Dist(n)
Examples:
Create an empty distribution with support size 3:
Dist(3)
Create a distribution with 2 events, the first observed 5 times, the second observed 3 times:
Dist(c(5, 3))
If the distribution is NULL
, a new distribution of the requested size is created.
If the distribution already has the requested size, then the original distribution is returned. If the requested size is greater than the current size, then the newly observable events are assigned zero frequency. If the requested size is less than the current size, then the discarded events are lost and the observation count is adjusted accordingly.
Note that the distribution cannot be reduced to size 0
. In that case the distribution is
left unmodified.
Interface:
Resize the distribution d
to have new support of size n
.
resize(d, n)
Examples:
Create a distribution with size 2:
d < Dist(c(2, 13)) d
Increase the size of the support to 5:
d < resize(d, 5) d
Decrease the size of the support to 3:
d < resize(d, 3) d
If the source distribution is NULL
, then NULL
is returned.
Interface:
Return a copy of the distribution d
.
copy(d)
Examples:
Create a base distribution d
:
d < Dist(c(1:5)) d
Copy distribution to a different object p
:
p < copy(d) p
Interface:
Infer a distribution from a collection of observed events
.
infer(events)
Examples:
Create a distribution $d = {3/5, 2/5}$:
d < infer(c(0, 0, 1, 0, 1)) dump(d)
Create a distribution $d = {3/8, 3/8, 2/8}$:
d < infer(c(0, 0, 1, 0, 1, 2, 2, 1)) dump(d)
Interface:
Approximate a given probability distribution to a given tolerance.
approximate(probs, tol)
Examples:
Approximate a distribution with tolerance $10^{3}$:
probs < c(0.5, 0.2, 0.3) d < approximate(probs, 1e3) d$histogram probs < c(1.0 / 3, 2.0 / 3) d < approximate(probs, 1e3) d$histogram
Interface:
Create a uniform distribution of a given size n
. The support size n
must
define a valid support (i.e., n
must be greater than 0).
uniform(n)
Examples:
Initialize a uniform distribution of size 3:
d < uniform(3) d dump(d)
Interface:
Get the size of the distribution's support. If the distribution is NULL
, then a support
of 0 is returned.
length(d)
Examples:
Size of a NULL distribution:
d < NULL length(d)
Size of a distribution with 5 elements:
d < Dist(5) length(d)
Interface:
Get the total number of observations so far made.
counts(d)
Examples:
Counts of valid distribution:
dist < Dist(c(5, 10)) counts(dist) == 15
Let's modify the distribution and count again:
dist < set_item(dist, 2, 5) counts(dist) == 10
Interface:
Determine whether or not the distribution is valid. In order to safely extract probabilities, the size of the support must be nonzero and the number of observations must be nonzero. In any other case, the distribution is invalid.
valid(d)
Examples:
Invalid distribution with 0 observations:
dist < Dist(3) valid(dist)
A (valid) distribution with a support of 5 elements:
dist < Dist(c(1:5)) valid(dist)
Let's invalidate distribution by zeroing its support:
dist$size < as.integer(0) valid(dist)
Interface:
Get the number of occurrences of a given event
.
get_item(d, event)
Examples:
Get all items in a sequence from a valid distribution:
dist < Dist(c(3, 2, 1, 0)) get_item(dist, 1) == 3 get_item(dist, 2) == 2 get_item(dist, 3) == 1 get_item(dist, 4) == 0
Interface:
Set the number of occurrences of a given event
. This function manually sets the number
of occurrences of a given event
. Note that the only
restriction is that the value be positive. This means that this function can be used to
invalidate the distribution by changing all of the event frequencies to zero.
set_item(d, event, value)
Examples:
Let's initialize empty distribution
dist < Dist(2) dist
Now we can set items and make it a valid distribution:
dist < set_item(dist, 1, 3) dist < set_item(dist, 2, 8) dist
Interface:
Increment the number of observations of a given event
and return an updated
copy of the distribution d
. As an alternative to set_item
, this
function simply increments the number of occurrences of a given event. This
is useful when iteratively observing events.
tick(d, event)
Examples:
Let's creare an initial distribution:
dist < Dist(c(2, 4))
Now we can add an observation for each event:
dist < tick(dist, 1) get_item(dist, 1) == 3 dist < tick(dist, 2) get_item(dist, 2) == 5
Interface:
Accumulate observations from a series. If an invalid distribution is provided, no events will be observed and an error will be raised. If an invalid event is provided, then the number of valid events to that point will be added and a warning will be raised.
accumulate(d, events)
Examples:
Let's create a valid distribution and dump it:
d < Dist(c(1, 2, 3)) dump(d)
We can now accumulate events:
events < c(0, 0, 1, 0, 1) d < accumulate(d, events) dump(d)
If we try to accumulate invalid events,
events < c(0, 1, 1, 3, 1) d < accumulate(d, events) dump(d)
only events up to the first invalid entry will be added.
Interface:
Extract the probability of an event
from a distribution d
. This function simply computes the
probability of a given event
and returns that value.
probability(d, event)
Example:
Let's initialize a distribution with a support of 3:
dist < Dist(c(2, 2, 4))
We can compute probabilities as:
probability(dist, 1) == 0.25 probability(dist, 2) == 0.25 probability(dist, 3) == 0.50
Interface:
Dump the probabilities of all events to an array. This function computes the probabilities of all of the events in the support and return them as an array.
dump(d)
Example:
Let's initialize distribution and dump its probabilities:
dist < Dist(c(2, 2, 4)) dump(dist)
Now we can modify the distribution and dump it again:
dist < set_item(dist, 1, 12) dump(dist)
The rinform package provides a collection of entropy and other information measures defined on discrete probability distributions (see Dist). These functions form the core of Inform as all of the time series analysis functions are built upon them. rinform provides access to these functions in the case the user wants to use them outside the (already wrapped) time series measures.
Taking $X$ to be a random variable with $p_X$ a probability distribution on $X$, the base$b$ Shannon entropy is defined as $$ H(X) =  \sum_{x} p_X(x) \log_b p_X(x). $$
See [@Shannon1948] for more details.
Interface:
Compute the baseb
Shannon entropy of the distribution p
.
shannon_entropy(p, b = 2)
Example:
Compute the Shannon entropy of a uniform distribution:
p < Dist(c(1, 1, 1, 1)) shannon_entropy(p) shannon_entropy(p, b = 4)
Compute the Shannon entropy of a nonuniform distribution:
p < Dist(c(2, 1)) shannon_entropy(p) shannon_entropy(p, b = 3)
Mutual information provides a measure of the mutual dependence between two random variables. Let $X$ and $Y$ be random variables with probability distributions $p_X$ and $p_Y$ respectively, and $p_{X,Y}$ the joint probability distribution over $(X,Y)$. The base$b$ mutual information between $X$ and $Y$ is defined as $$ \begin{split} I(X;Y) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)} \ &= H(X) + H(Y)  H(X,Y). \end{split} $$ Here the second line takes advantage of the properties of logarithms and the definition of Shannon entropy.
To some degree one can think of mutual information as a measure of the (linear and nonlinear) coorelations between random variables.
See [@Cover1991] for more details.
Interface:
Compute the baseb
mutual information between two random variables $X$ and $Y$.
shannon_mutual_info(p_xy, p_x, p_y, b = 2)
Examples:
Compute the base2 mutual information between two random variables:
xy < Dist(c(10, 70, 15, 5)) x < Dist(c(80, 20)) y < Dist(c(25, 75)) shannon_mutual_info(xy, x, y)
Conditional entropy quantifies the amount of information required to describe a random variable $X$ given knowledge of a random variable $Y$. With $p_Y$ the probability distribution of $Y$, and $p_{X,Y}$ the joint distribution for $(X,Y)$, the base$b$ conditional entropy is defined as $$ \begin{split} H(XY) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_Y(y)} \ &= H(X,Y)  H(Y). \end{split} $$
See [@Cover1991] for more details.
Interface:
Compute the baseb
conditional entropy given joint (p_xy
) and marginal
(p_y
) distributions.
shannon_conditional_entropy(p_xy, p_y, b = 2)
Examples:
Compute the base2 conditional entropy between two random variables:
xy < Dist(c(10, 70, 15, 5)) x < Dist(c(80, 20)) y < Dist(c(25, 75)) shannon_conditional_entropy(xy, x) shannon_conditional_entropy(xy, y)
Conditional mutual information was introduced by [@Dobrushin1959] and [@Wyner1978] and approximately quantifies the average mutual information between random variables $X$ and $Y$ given knowledge of a third $Z$. Following the same notation as for the conditional entropy, the base$b$ conditional mutual information is defined as $$ \begin{split} I(X;YZ) &= \sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,YZ}(x,yz)}{p_{XZ}(xz)p_{YZ}(yz)} \ &= \sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)} \ &= H(X,Z) + H(Y,Z)  H(Z)  H(X,Y,Z) \end{split} $$
Interface:
Compute the baseb
conditional mutual information given the joint
(p_xyz
) and marginal (p_xz
, p_yz
, p_z
) distributions.
shannon_cond_mutual_info(p_xyz, p_xz, p_yz, p_z, b = 2)
Examples:
Compute the base2 conditional mutual information between two random variables given a third:
xyz < Dist(c(24, 24, 9, 6, 25, 15, 10, 5)) xz < Dist(c(15, 9, 5, 10)) yz < Dist(c(9, 15, 10, 15)) z < Dist(c(3, 5)) shannon_cond_mutual_info(xyz, xz, yz, z)
Relative entropy, also known as the KullbackLeibler divergence, was introduced by Kullback and Leiber in 1951 [@Kullback1951]. Given a random variable $X$, two probability distributions $p_X$ and $q_X$, relative entropy measures the information gained in switching from the prior $q_X$ to the posterior $p_X$: $$ D_{KL}(p_X  q_X) = \sum_x p_X(x) \log_b \frac{p_X(x)}{q_X(x)}. $$
Many of the information measures (e.g. mutual information, conditional entropy) amount to applications of relative entropy for various prior and posterior distributions.
Interface:
Compute the baseb
relative entropy between posterior (p
) and prior
(q
) distributions.
shannon_relative_entropy(p, q, b = 2)
Examples:
Compute the base2 relative entropy between posterior and prior distributions:
p < Dist(c(4, 1)) q < Dist(c(1, 1)) shannon_relative_entropy(p, q) shannon_relative_entropy(q, p) p < Dist(c(1, 0)) q < Dist(c(1, 1)) shannon_relative_entropy(p, q) shannon_relative_entropy(q, p)
The cross entropy between two probability distributions $p$ and $q$ over the same support measures the average information needed to identify an event drawn from the set using a coding optimized for an "unnatural" probability distribution $q$, rather than the "true" distribution $p$. The cross entropy for the distributions $p$ and $q$ is defined as: $$ \begin{split} H(p_{X}, q_{X}) &= \sum_{x} p_{X}(x) \log_b q_{X}(x) \ &= H(p_{X})  D_{KL}(p_X  q_X). \end{split} $$
Interface:
Compute the base‘b’ Shannon cross entropy between a true distribution ‘p’ and an unnatural distribution ‘q’.
shannon_cross_entropy(p, q, b = 2)
Examples:
Compute the base2 cross entropy between two distributions:
p < Dist(c(1, 0, 0)) q < Dist(c(2, 1, 1)) shannon_cross_entropy(p, q) shannon_cross_entropy(q, p) shannon_cross_entropy(p, q, b = 3) shannon_cross_entropy(q, p, b = 3)
The original purpose of Inform was to analyze time series data. This explains why most of Inform's functionality resides in functions specifically optimized for analyzing time series. Many information measures have "local" variants which compute a time series of pointwise values. These feature is directly inherited by the rinform wrapper.
We have been meticulous in ensuring that function and parameter names are consistent across measures. If you notice some inconsistency, please report it as an issue.
Throughout the discussion of time series measures, we will try to use a consistent notation. We will denote random variables as $X,Y,\ldots$, and let $x_i,y_i,\ldots$ represent the $i$th time step of a time series drawn from the associated random variable. Many of the measures consider $k$histories (a.k.a $k$blocks) of the time series, e.g. $$x_i^{(k)} = \left{x_{ik+1}, x_{ik+2},\ldots,x_i\right}$$.
When denoting probability distributions, we will only make the random variable explicit in situations where the notation is ambiguous. We will typically write $p(x_i)$, $p(x_i^{(k)})$, and $p(x_i^{(k)}, x_{i+1})$ to denote the empirical probability of observing the $x_i$ state, the $x_i^{(k)}$ $k$history, and the joint probability of observing $\left(x_i^{(k)}, x_{i+1}\right)$.
The word "base" has two different meanings in the context of information measures on time series. It could refer to the base of the time series itself, that is the number of unique states in the time series. For example, the time series ${0,2,1,0,0}$ is a base3 time series. On the other hand, it could refer to the base of the logarithm used in computing the information content of the inferred probability distributions. The problem is that these two meanings clash. The base of the time series affects the range of values the measure can produce, and the base of the logarithm represents a rescaling of those values.
In the rinform wrapper we deal with this by always using base2 logarithms and automatically computing the base of each time series internally to the wrapper. All of this ensures that the library is a simple as reasonably possible.
You generally need a lot of data to infer a probability distribution. An experimentalist or simulator might then collect data over multiple trials or initial conditions. Generally, this process also involves a large number of variables. Most of rinform's time series measures allow the user to pass in a twodimensional matrix with each column representing a time series from a different initial condition and/or variable. From this the probability distributions are inferred, and the desired value is calculated. This has the downside of requiring that the user store all of the data in memory, but it has the advantage of being fast and simple.
Time series measures provided by rinform accept as arguments Vector and Matrix objects. A Vector, for example,
xs < c(1, 0, 0, 1, 0, 0, 1, 1) xs
defines one time series and its length defines the number of time steps, $m$. Specifically, it defines one initial condition or trial for one variable. The user can deal with multiple variables, $l$, and multiple initial conditions, $n$, using Matrix objects with $l*n$ columns and $m$ rows:
l < 2 n < 2 m < 5 xs < matrix(sample(0:1, m * l * n, TRUE), nrow = m, ncol = l * n) xs
where each column represents a different time series, possibly from different variables.
In the previous example we have $l = 2$ variables, respectively, $X_1$ and $X_2$, with
$n = 2$ trials each, where each series has $m = 5$ time steps. Columns [, 1]
and [, 2]
represent trial 1 and 2 of variable $X_1$ while columns [, 3]
and [, 4]
represent
trial 1 and 2 of variable $X_2$.
When possible, rinform's functions infer the number of variables, $l$, the number of initial conditions, $n$, and the number of time steps $m$ from the provided data structures. This is possible in most cases due to the fact that all time series must have the same number of initial conditions and time steps. If this is not the case, the rinform's API will explicity require the user to specify the number of variables $l$.
Active information (AI) was introduced in [@Lizier2012] to quantify information storage in distributed computations. Active information is defined in terms of a temporally local variant
$$ a_{X,i}(k) = \log_2{\frac{p(x_i^{(k)}, x_{i+1})}{p(x_i^{(k)})p(x_{i+1})}}, $$
where the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global active information is defined as
$$ A_X(k) = \langle a_{X,i}(k) \rangle_i = \sum_{x_i^{(k)},x_{i+1}} p(x_i^{(k)},x_{i+1}) \log_2{\frac{p(x_i^{(k)}, x_{i+1})}{p(x_i^{(k)})p(x_{i+1})}}. $$
Strictly speaking, the local and average active information are defined as
$$ a_{X,i} = \lim_{k\rightarrow \infty}a_{X,i}(k) \qquad \textrm{and} \qquad A_X = \lim_{k\rightarrow \infty}A_X(k), $$
but we do not provide yet limiting functionality in this library (GitHub issues).
Interface:
Compute the average or local active information with a history length k
.
active_info(series, k, local = FALSE)
Examples:
Compute Active Information with one initial condition:
series < c(0, 0, 1, 1, 1, 1, 0, 0, 0) active_info(series, k = 2)
and local variant:
lai < active_info(series, k = 2, local = T) t(lai)
Compute Active Information with two initial conditions:
series < matrix(nrow = 9, ncol = 2) series[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) series[, 2] < c(1, 0, 0, 1, 0, 0, 1, 0, 0) active_info(series, k = 2)
and local variant:
lai < active_info(series, k = 2, local = T) t(lai)
Block entropy, also known as $N$gram entropy [@Shannon1948], is the standard Shannon entropy of the $k$histories of a time series: $$ H(X^{(k)}) = \sum_{x_i^{(k)}} p(x_i^{(k)}) \log_2{p(x_i^{(k)})} $$ which reduces to the traditional Shannon entropy for $k = 1$.
Interface:
Compute the average or local block entropy of a time series with block size k
.
block_entropy(series, k, local = FALSE)
Examples:
Compute Block Entropy with one initial condition and history lenght $k \in {1, 2}$:
series < c(0, 0, 1, 1, 1, 1, 0, 0, 0) block_entropy(series, k = 1) block_entropy(series, k = 2)
and local variant:
be < block_entropy(series, k = 1, local = T) t(be) be < block_entropy(series, k = 2, local = T) t(be)
Compute Block Entropy with two initial conditions and history lenght $k = 2$:
series < matrix(nrow = 9, ncol = 2) series[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) series[, 2] < c(1, 0, 0, 1, 0, 0, 1, 0, 0) block_entropy(series, k = 2)
and local variant:
be < block_entropy(series, k = 2, local = T) t(be)
Conditional entropy is a measure of the amount of information required to describe a random variable $Y$ given knowledge of another random variable $X$. When applied to time series, two time series are used to construct the empirical distributions and the conditional entropy is given $$ H(YX) =  \sum_{x_i,y_i} p(x_i,y_i) \log_2{p(y_ix_i)}. $$
This can be viewed as the timeaverage of the local conditional entropy $$ h_i(YX) = \log_2{p(y_ix_i)}. $$ See [@Cover1991] for more information.
Interface:
Compute the average and local conditional entropy between two time series.
This function expects the condition to be the first argument, xs
. It is expected that
each time series be the same length n
, but may have different bases.
conditional_entropy(xs, ys, local = FALSE)
Examples:
Compute Conditional Entropy between two time series:
xs < c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1) ys < c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1) conditional_entropy(xs, ys) conditional_entropy(ys, xs)
and local variant:
ce < conditional_entropy(xs, ys, local = T) t(ce) ce < conditional_entropy(ys, xs, local = T) t(ce)
Cross entropy between two distributions $p_X$ and $q_X$ measures the amount of information needed to identify events using a coding scheme optimized for $q_X$ when $p_X$ is the "real" distribution over $X$. $$ H(p,q) = \sum_{x} p(x) \log_2{q(x)} $$ Cross entropy's local variant is equivalent to the selfinformation of $q_X$ and as such is implemented by the local block entropy.
See [@Cover1991] for more details.
Interface:
Compute the cross entropy between the "true" and "unnatural" distributions $p_X$ and
$q_X$ from associated time series ps
and qs
, respectively.
cross_entropy(ps, qs)
Examples:
Compute Cross Entropy between two time series:
ps < c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0) qs < c(0, 0, 0, 0, 0, 1, 1, 0, 0, 1) cross_entropy(ps, qs) cross_entropy(qs, ps)
Effective information is a causal measure aimed at teasing out the causal structure of a dynamical system. In essence, it is the mutual information between an "intervention" distribution — a probability distribution over initial states — and the distribution after one time step: $$ EI(A,p) = I(p,p^TA) $$ where $A$ is the transition probability matrix and $p$ an intervention distribution. Functionality to construct a transition probability matrix from time series is provided by the series_to_tpm function.
See [@Tononi2003], [@Hoel2013] or [@Hoel2017] for more information.
Interface:
Compute the effective information from an $n \times n$ transition probability matrix
tpm
given an intervention distribution inter
. If inter
is NULL
, then the
uniform distribution over the $n$ states is used.
effective_info(tpm, inter = NULL)
Examples:
Compute Effective Information with a uniform intervention distribution:
tpm < matrix(0, nrow = 2, ncol = 2) tpm[, 1] < c(0.50, 0.5) tpm[, 2] < c(0.25, 0.75) effective_info(tpm, NULL)
Compute Effective Information with a nonuniform intervention distribution:
tpm < matrix(0, nrow = 2, ncol = 2) tpm[, 1] < c(0.50, 0.5) tpm[, 2] < c(0.25, 0.75) inter < c(0.488372, 0.511628) effective_info(tpm, inter)
Entropy rate quantifies the amount of information needed to describe the next state of $X$ given observations of $X^{(k)}.$ In other wrods, it is the entropy of the time series conditioned on the $k$histories. The local entropy rate $$ h_{X,i}(k) = \log_2{\frac{p(x_i^{(k)}, x_{i+1})}{p(x_i^{(k)})}}. $$ can be averaged to obtain the global entropy rate $$ H_X(k) = \langle h_{X,i}(k) \rangle_i = \sum_{x_i^{(k)},x_{i+1}} p(x_i^{(k)},x_{i+1}) \log_2{\frac{p(x_i^{(k)}, x_{i+1})}{p(x_i^{(k)})}}. $$
Much as with active information, the local and average entropy rates are formally obtained in the limit $$ h_{X,i} = \lim_{k\rightarrow \infty}h_{X,i}(k) \qquad \textrm{and} \qquad H_X = \lim_{k\rightarrow \infty}H_X(k), $$
but we do not provide yet limiting functionality in this library (GitHub issues).
See [@Cover1991] for more details.
Interface:
Compute the average or local entropy rate with a history length k
.
entropy_rate(series, k, local = FALSE)
Examples:
Compute the Entropy Rate for one initial condition:
series < c(0, 0, 1, 1, 1, 1, 0, 0, 0) entropy_rate(series, k = 2)
and local variant:
er < entropy_rate(series, k = 2, local = T) t(er)
Compute the Entropy Rate for two initial conditions:
series < matrix(nrow = 9, ncol = 2) series[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) series[, 2] < c(1, 0, 0, 1, 0, 0, 1, 0, 0) entropy_rate(series, k = 2)
local variant:
er < entropy_rate(series, k = 2, local = T) t(er)
Formally, the excess entropy is the mutual information between two adjacent, semiinfinite blocks of variables: $$ E_X = \lim_{k \rightarrow \infty}I[(x_{k},\ldots,x_{1}),(x_0,\ldots,x_{k1})]. $$ Because we cannot take the limit in practice, we implement the finite form: $$ E_X(k) = I[(x_{k},\ldots,x_1),(x_0,\ldots,x_{k1})]. $$
We can think of excess entropy as a slight generalization of active information or a special case of predictive information.
See [@Crutchfield2003] and [@Feldman2003] for more details.
Interface:
Compute the average or local excess entropy from a time series with block size k
.
excess_entropy(series, k, local = FALSE)
Examples:
Compute Excess Entropy for one initial condition:
series < c(0, 0, 1, 1, 0, 0, 1, 1, 0) excess_entropy(series, k = 2)
and local variant:
ee < excess_entropy(series, k = 2, local = TRUE) t(ee)
Compute Excess Entropy for two initial conditions:
series < matrix(0, nrow = 9, ncol =2) series[, 1] < c(0, 0, 1, 1, 0, 0, 1, 1, 0) series[, 2] < c(0, 1, 0, 1, 0, 1, 0, 1, 0) excess_entropy(series, k = 2)
and local variant:
ee < excess_entropy(series, k = 2, local = TRUE) t(ee)
Evidence of Integration (EoI) was introduced in [@Biehl2016] as a means of identifying representations of agents in dynamical systems. Given a collection of random variables, $\mathcal{X}={X_1,X_2,\ldots,X_l}$, we can form (nontrivial) partitionings of $\mathcal{X}$: $$ {\mathcal{Z} \subset \mathcal{P}(\mathcal{X})\setminus{\mathcal{X},\varnothing}\ \ \mathcal{Z} \not= \varnothing, \bigcup_{Z \in \mathcal{Z}} Z = \mathcal{X} ~\textrm{and}~ Z_i \cap Z_j = \varnothing, i \not = j } $$
Each partitioning $\mathcal{Z} = {Z_1, \ldots, Z_m}$ consists of a collection of joint random variables built from $\mathcal{X}$. The configurations of $\mathcal{Z}$ are in onetoone correspondence with the configurations of $\mathcal{X}$. Given such a partitioning, we can ask what the pointwise mutual information is: $$ mi_\mathcal{Z}(z_1, \ldots, z_m) = \log{\frac{p(z_1, \ldots, z_m)}{p(z_1) \ldots p(z_m)}}. $$
If $z={z_1,\ldots,z_m}$ corresponds to the configuration $x={x_1,\ldots,x_l}$, we refer to the above mutual information as the "evidence of integration of $x$ with respect to the partitioning $\mathcal{Z}$".
We say that a configuration $x$ is integrated if and only if the evidence of integration of $x$ is positive for all partitionings of the system.
Interface:
Given a sequence of $n$ observed states of $l$ random variables (series
),
compute the evidence of integration for each partitioning of the $l$ variables, and
return the minimum and maximum evidence for each observation. If the number of
variables $l$ is large, the user can test a small subset of the
partitioning schemata by providing a partitioning parts
to increase confidence that the
system is integrated. In this case, the function computes and returns the
evidence of integration for each observation with respect to the partitioning parts
.
integration_evidence(series, parts = NULL)
Examples:
Compute Evidence of Integration of three time series:
series < matrix(0, nrow = 10, ncol = 3) series[, 1] < c(0, 1, 0, 1, 1, 1, 0, 0, 1, 0) series[, 2] < c(0, 1, 0, 1, 1, 1, 0, 0, 1, 0) series[, 3] < c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0) integration_evidence(series)
Compute Evidence of Integration for the same three time series but only for the
partitionning c(1, 1, 2)
:
parts < c(1, 1, 2) integration_evidence(series, parts)
Information flow (IF) was introduced by Ay and Polani as a measure of the "strength of a causal effect" [@Ay2008]. Unlike many of the other measures in rinform, information flow was designed around interventions. Rather than allowing dynamics to progress naturally, some subset of the random variables are forced into a particular state. If no interventions are performed, then IF reduces to Conditional Mutual Information.
Practially, there is no way for rinform to know from time series whether or not any interventions have actually been performed upon the system. As such, the onous is on the user to determine whether this function is infact computing Ay and Polani's information flow.
In accordance with [@Ay2008], information flow is defined as $$ I_p(X \rightarrow Y  Z) = \sum_{x,y,z} p(z)p(x\hat{z})p(y\hat{x},\hat{z}) \log{ \frac{p(y\hat{x},\hat{z})} {\sum_{x^\prime} p(x^\prime\hat{z})p(y\hat{x}^\prime,\hat{z})} } $$
All probabilities are estimated from sequences of observations — the order of those sequences is irrelevant. A future version of Inform, and thereafter rinform, will allow the user to specify intervention distributions.
Interface:
Compute the information flow from lsrc
time series src
to another ldst
time series dst
. Optionally, the user can specify lback
background time series
back
to conditioning probabilities.
info_flow(src, dst, back = NULL, lsrc, ldst, lback = 0)
Examples:
This example demonstrates how to user info_flow
to analyzed Ay and Polani's
diamond structure example, p.2829 in [@Ay2008]. In this example, we
have four interacting Boolean variables $W$, $X$, $Y$ and $Z$. The variables $X$ and
$Y$ are each dependent on $W$: they simply copy it's state. The $Z$ variable
is the XOR of the states of $X$ and $Y$. The variable $W$ is uniformly
distributed.
Left to it's own devices, this system might produce a sequence of observations like (natural dynamics):
ws < c(0, 0, 1, 0, 1, 1, 0, 1) xs < c(0, 0, 1, 0, 1, 1, 0, 1) ys < c(0, 0, 1, 0, 1, 1, 0, 1) zs < c(0, 0, 0, 0, 0, 0, 0, 0)
From these we can compute the following information flows: $$ I_p(X \rightarrow Y) = 1 \ I_p(X \rightarrow Y  W) = 0 \ I_p(W \rightarrow Z  Y) = 0. $$
Our notation departs somewhat from that of [@Ay2008] as we denote
these as information flows despite the fact that we are not strictly intervening
upon the system. If the user prefers, they can think of this as "intervening on
the system that is indistiguishable from the natural dynamics". In any case, we can
use info_flow
to compute the above values:
info_flow(src = xs, dst = ys, lsrc = 1, ldst = 1) info_flow(src = xs, dst = ys, back = ws, lsrc = 1, ldst = 1, lback = 1) info_flow(src = ws, dst = zs, back = ys, lsrc = 1, ldst = 1, lback = 1)
Now, consider that we can intervene on $Y$ and force it to whichever state we so choose. Then we might end up with a sequence of observations as:
ws < c(0, 0, 1, 0, 1, 1, 0, 1) xs < c(0, 0, 1, 0, 1, 1, 0, 1) ys < c(1, 0, 1, 0, 0, 1, 1, 0) zs < c(1, 0, 0, 0, 1, 0, 1, 1)
From these we can compute the following information flows: $$ I_p(X \rightarrow Y) = 0 \ I_p(X \rightarrow Y  W) = 0 \ I_p(W \rightarrow Z  Y) = 1. $$
In rinform this looks like:
info_flow(src = xs, dst = ys, lsrc = 1, ldst = 1) info_flow(src = xs, dst = ys, back = ws, lsrc = 1, ldst = 1, lback = 1) info_flow(src = ws, dst = zs, back = ys, lsrc = 1, ldst = 1, lback = 1)
Mutual information (MI) is a measure of the amount of mutual dependence between at least two random variables. Locally, MI is defined as $$ i_i(X_1,\ldots,X_l) = \frac{p(x_{1,i},\ldots,x_{l,i})}{p(x_{1,i})\ldots p(x_{l,i})}. $$ The mutual information is then just the time average of $i_i(X_1, \ldots, X_l)$: $$ I(X_1,\ldots,X_l) = \sum_{x_{1,i},\ldots,x_{l,i}} p(x_{1,i},\ldots,x_{l,i}) \frac{p(x_{1,i},\ldots,x_{l,i})}{p(x_{1,i})\ldots p(x_{l,i})}. $$
See [@Cover1991] for more details.
Interface:
Compute the mutual information or its local variant between two or more time series. Each variable can have a different base.
mutual_info(series, local = FALSE)
Examples:
Compute the Mutual Information between two variables:
xs < matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1), ncol = 2) mutual_info(xs)
and local variant:
mi < mutual_info(xs, local = T) t(mi)
Compute the Mutual Information between three variables:
xs < matrix(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), ncol = 3) mutual_info(xs)
and local variant:
mi < mutual_info(xs, local = T) t(mi)
Formally, the predictive information is the mutual information between a finitehistory and a semiinfinite future: $$ P_X(k) = \lim_{l \rightarrow \infty}I[(x_{k},\ldots,x_{1}),(x_0,\ldots,x_{l1})]. $$ Of course, we cannot take the limit in practice, so we implement the finite form: $$ P_X(k,l) = I[(x_{k},\ldots,x_1),(x_0,\ldots,x_{k1})]. $$
We can think of active information and excess entropy as a special cases of predictive information.
See [@Bialek2001a] and [@Bialek2001b] for more details.
Interface:
Compute the average or local predictive information from a time series series
with
history length kpast
and future length kfuture
.
predictive_info(series, kpast, kfuture, local = FALSE)
Examples:
Our examples will compute the predictive information between the current time step and the next two time steps of a time series, $P_X(1, 2)$.
One initial condition:
series < c(0, 0, 1, 1, 0, 0, 1, 1, 0) predictive_info(series, 1, 2)
and local variant:
pi < predictive_info(series, 1, 2, T) t(pi)
Two initial conditions:
series < matrix(0, nrow = 9, ncol = 2) series[, 1] < c(0, 0, 1, 1, 0, 0, 1, 1, 0) series[, 2] < c(0, 1, 0, 1, 0, 1, 0, 1, 0) predictive_info(series, 1, 2)
and local variant:
pi < predictive_info(series, 1, 2, T) t(pi)
Relative entropy, also known as the KullbackLeibler divergence, measures the amount of information gained in switching from a prior distribution $q_X$ to a posterior distribution $p_X$ over the same support: $$ D_{KL}(pq) = \sum_{x_i} p(x_i) \log_2{\frac{p(x_i)}{q(x_i)}}. $$ The local counterpart is $$ d_{KL,i}(pq) = log_2{\frac{p(x_i)}{q(x_i)}}. $$ Note that the average in moving from the local to the nonlocal relative entropy is taken over the posterior distribution.
See [@Kullback1951] and [@Cover1991] for more information.
Interface:
Compute the average and local relative entropy between time series drawn
from posterior and prior distributions, here xs
and ys
respectively.
relative_entropy(xs, ys, local = FALSE)
Examples:
Compute the average relative entropy between two time series:
xs < c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1) ys < c(0, 1, 1, 1, 1, 0, 0, 1, 0, 0) relative_entropy(xs, ys) relative_entropy(ys, xs)
and local variant:
re < relative_entropy(xs, ys, local = T) t(re) re < relative_entropy(ys, xs, local = T) t(re)
Separable information (SI) was introduced in [@Lizier2010] as a method for detecting information modification in distributed computations. Given a random variable $Y$ and a collection of potentially informative sources $V_Y$ (wherein $Y \not\in V_Y$), separable information quantifies the total information gained about $Y$ from seperate observations of histories of $Y$ and "transfer contributions" from each of the sources $X \in V_Y$. In other words, it is the sum of the Active Information of $Y$ and the (apparent) Transfer Entropy from each source to $Y$: $$ s_{Y,i}(k) = a_{Y,i}(k) + \sum_{X \in V_Y} t_{X \rightarrow Y, i}(k),\ S_Y(k) = \langle s_{Y,i}(k) \rangle_i = A_Y(k) + \sum_{X \in V_Y} T_{X \rightarrow Y}(k). $$
Interface:
Compute the average or local separable information of dest
given a set of sources srcs
and history length k
.
separable_info(srcs, dest, k, local = FALSE)
Examples:
Compute Separable Information for one initial condition and one source:
dest < c(0, 1, 1, 1, 1, 0, 0, 0, 0) srcs < c(0, 0, 1, 1, 1, 1, 0, 0, 0) separable_info(srcs, dest, k = 2)
and local variant:
si < separable_info(srcs, dest, k = 2, local = TRUE) t(si)
Compute Separable Information for one initial condition and multiple sources:
dest < c(0, 1, 1, 1, 1, 0, 0, 0, 0) srcs < matrix(0, nrow = 9, ncol = 2) srcs[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) srcs[, 2] < c(1, 1, 1, 1, 0, 0, 0, 0, 0) separable_info(srcs, dest, k = 2)
and local variant:
si < separable_info(srcs, dest, k = 2, local = TRUE) t(si)
Compute Separable Information for multiple initial conditions and multiple sources:
dest < matrix(0, nrow = 9, ncol = 2) dest[, 1] < c(0, 1, 1, 1, 1, 0, 0, 0, 0) dest[, 2] < c(1, 1, 0, 1, 1, 0, 1, 1, 0) srcs < matrix(0, nrow = 9, ncol = 4) srcs[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) srcs[, 2] < c(1, 1, 1, 1, 1, 0, 1, 1, 0) srcs[, 3] < c(1, 1, 1, 1, 0, 0, 0, 0, 0) srcs[, 4] < c(0, 0, 0, 0, 1, 1, 1, 1, 0) separable_info(srcs, dest, k = 2)
and local variant:
si < separable_info(srcs, dest, k = 2, local = TRUE) t(si)
Transer entropy (TE) was introduced in [@Schreiber2000] to quantify information transfer between an information source and a destination, conditioning out shared history effects. TE was originally formulated considering only the source and destination; however many systems of interest have more than just those two components. As such, it may be further necessary to condition the probabilities on the states of all "background" components in the system. These two forms are sometimes referred to as apparent and complete transfer entropy, respectively ([Lizier2008]).
Our implementation of TE allows the user to condition the probabilities on any number of background processes, within hardware limits. For the subsequent description, take $X$ to be the source, $Y$ the target, $W = {W_1, \ldots, W_l}$ be the background processes against which we'd like to condition. For example, we might take the state of two nodes in a dynamical network as the source and target, while all other nodes in the network are treated as the background. Transfer entropy is then defined in terms of a timelocal variant $$ t_{X \rightarrow Y, \mathcal{W}, i}(k) = \log_2{\frac{p(y_{i+1},x_iy^{(k)}i,W_i)} {p(y{i+1}y^{(k)}i,W_i)p(x{i+1}y^{(k)}_i,W_i)}} $$
where $W_i = w_{1,i}, \ldots, w_{l,i}$ are the states of each of the background nodes at time step $i$, and the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global transfer entropy is defined as $$ T_{X \rightarrow Y, \mathcal{W}}(k) = \langle t_{X \rightarrow Y, \mathcal{W}, i}(k) \rangle_i = \sum_{y_{i+1},y^{(k)},x_i,W_i} p(y_{i+1},y^{(k)}i,x_i,W_i) \log_2{\frac{p(y{i+1},x_iy^{(k)}i,W_i)} {p(y{i+1}y^{(k)}i,W_i)p(x{i+1}y^{(k)}_i,W_i)}}. $$
Strictly speaking, the local and average transfer entropy are defined as $$ t_{X \rightarrow Y, \mathcal{W}, i} = \lim_{k\rightarrow \infty} t_{X \rightarrow Y, \mathcal{W}, i}(k) \qquad \textrm{and} \qquad T_{X \rightarrow Y, \mathcal{W}} = \lim_{k\rightarrow \infty} t_{X \rightarrow Y, \mathcal{W}}(k), $$
but we do not provide yet limiting functionality in this library (GitHub issues).
Interface:
Compute the average or local transfer entropy with a history length k
with
the possibility to conditioning on the background ws
.
transfer_entropy(ys, xs, ws = NULL, k, local = FALSE)
Examples:
One initial condition, no background:
xs < c(0, 1, 1, 1, 1, 0, 0, 0, 0) ys < c(0, 0, 1, 1, 1, 1, 0, 0, 0) transfer_entropy(xs, ys, ws = NULL, k = 2)
and local variant:
te < transfer_entropy(xs, ys, ws = NULL, k = 2, local = T) t(te)
Two initial conditions, no background:
xs < matrix(0, nrow = 9, ncol = 2) xs[, 1] < c(1, 0, 0, 0, 0, 1, 1, 1, 1) xs[, 2] < c(1, 1, 1, 1, 0, 0, 0, 1, 1) ys < matrix(0, nrow = 9, ncol = 2) ys[, 1] < c(0, 0, 1, 1, 1, 1, 0, 0, 0) ys[, 2] < c(1, 0, 0, 0, 0, 1, 1, 1, 0) transfer_entropy(xs, ys, ws = NULL, k = 2)
and local variant:
te < transfer_entropy(xs, ys, ws = NULL, k = 2, local = T) t(te)
One initial condition, one background process:
xs < c(0, 1, 1, 1, 1, 0, 0, 0, 0) ys < c(0, 0, 1, 1, 1, 1, 0, 0, 0) ws < c(0, 1, 1, 1, 1, 0, 1, 1, 1) transfer_entropy(xs, ys, ws, k = 2)
and local variant:
te < transfer_entropy(xs, ys, ws, k = 2, local = T) t(te)
The following example is interesting in that the two background processes provide the same information about the destination as the source process does ($X = W_1 \oplus W_2$) but they don't separately.
One initial condition, two background processes:
xs < c(0, 1, 1, 1, 1, 0, 0, 0, 0) ys < c(0, 0, 1, 1, 1, 1, 0, 0, 0) ws < matrix(c(1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1), ncol = 2) transfer_entropy(xs, ys, ws, k = 2)
and local variant:
te < transfer_entropy(xs, ys, ws, k = 2, local = T) t(te)
+++
 Type  Functions 
+==========================+==========================================================+
 Binning series_range
, bin_series

+++
 BlackBoxing  black_box
, black_box_parts

+++
 Coalescing  coalesce

+++
 State Encoding  encode
, decode

+++
 Partitioning Time Series  partitioning

+++
 Time Series to TPM  series_to_tpm

+++
All of the currently implemented time series measures are only defined on discretelyvalued time series. However, in practice continuouslyvalued time series are ubiquitous. There are two approaches to accomodating continuous values.
The simplest is to bin the time series, forcing the values into discrete states. This method has its downsides, namely that the binning is often a bit unphysical and it can introduce bias. What’s more, without some kind of guiding principle it can be difficult to decide exactly which binning approach.
The second approach attempts to infer condinuous probability distributions from continuous data. This is potentially more robust, but more technically d ifficult. Unfortunately, rinform does not yet have an implementation of information measures on continous distributions.
Interface:
Compute the range, minimum and maximum values in a floatingpoint time series and return them as a list.
series_range(series)
Examples:
Find range, minimum, and maximum values of a continuous state time series:
xs < c(0.2, 0.5, 3.2, 1.8, 0.6, 2.3) series_range(xs)
Bin a floatingpoint time series following one of three different modalities:
b
uniform sized
bins, and return the size of the bins. If the size of each bin is too small, less than
$10 \epsilon$, then all entries are placed in the same bin and an error is set.
($\epsilon$ is the doubleprecision machine epsilon.)step
, and return
the number of bins. If the bin size is too small, less than $10 \epsilon$, then an error
is set.bounds
, and
return the bounds of bins. Interface:
Bin a floatingpoint time series and return a list giving the binned sequence, the number of bins and either the bin sizes or bin bounds.
bin_series(series, b = NA, step = NA, bounds = NA)
Examples:
First method with b
uniformely sized bins:
series < c(1, 2, 3, 4, 5, 6) bin_series(series, b = 2)
Second method with speficied bin size step
:
bin_series(series, step = 2.0)
Third method with specific bin bounds
:
bin_series(series, bounds = c(3, 7))
It is often useful when analyzing complex systems to blackbox components of the system by aggregating its states. This process amounts to grouping the states of different components of the system and treating them as a single state on a larger statespace and, in doing so, the blackboxed components are treated as a single entity, without regard for its smallscale structure. As an example, consider you have two Boolean random variables $X$ and $Y$, and you observe time series of each simultaneously: $$ X:\ {0,1,1,0,1,0,0,1}\ Y:\ {1,0,0,1,1,0,1,0} $$
It may be worthwhile to consider these observations as observations of the joint variable $(X, Y)$: $$ (X,Y):\ {(0,1),(1,0),(1,0),(0,1),(1,1),(0,0),(0,1),(1,0)}. $$
The joint observations can then be naturally encoded, for example, as base4 states $$ (X,Y):\ {1,2,2,1,3,0,1,2}. $$
We refer this process of mapping the observations of $X$ and $Y$ to the encoded observations of $(X, Y)$ as black boxing. In this case, the black boxing procedure is performed in "space" (you might think of $X$ and $Y$ having locations in space. We may also black box in time. The canonical example of this is considering the $k$history of a random variable: $$ X:\ {0,1,1,0,1,0,0,1}\ X^{(2)}:\ {((0,1),(1,1),(1,0),(0,1),(1,0),(0,0),(0,1)}, $$ and the observations of $X^{(2)}$ can be encoded as base4 states: $$ X^{(2)}:\ {(1,3,2,1,2,0,1}. $$
We provide a basic blackboxing function, black_box
, that allows
the user to blackbox in both space and into the future and past of a collection
of random variables. The black_box_parts
allows the user to
blackbox time series based on a partitioning scheme
(useful in the implementation of integration measures such as
Evidence of Integration.
Interface:
Blackbox time series from a collection of l
sources where each
series has the same number of time steps, same number of initial conditions,
and possibly different bases into a single time series. The base of the
resulting time series is given by the product of the bases of each time
series in the collection. Blackboxing can be performed in time by
providing history lengths r
and future lengths through s
.
black_box(series, l, r = NULL, s = NULL)
Examples:
We'll black box two time series with no history and future:
series < matrix(0, nrow = 8, ncol = 2) series[, 1] < c(0, 1, 1, 0, 1, 0, 0, 1) series[, 2] < c(1, 0, 0, 1, 1, 0, 1, 0) black_box(series, l = 2)
Note that, this is the first example described at the beginning of this section.
Blackbox a single time series in time with history length 2:
series < c(0, 1, 1, 0, 1, 0, 0, 1) black_box(series, l = 1, r = 2)
This is the second example described at the beginning of this section.
Blackbox two time series with histories and futures:
series < matrix(0, nrow = 8, ncol = 2) series[, 1] < c(0, 1, 1, 0, 1, 0, 0, 1) series[, 2] < c(1, 0, 0, 1, 1, 0, 1, 0) black_box(series, l = 2, r = c(2, 1), s = c(0, 1))
Interface:
Blackbox time series from a collection of $l$ sources where each
series has the same number of time steps but possibly different bases into
a number of different time series according to the partitioning scheme parts
.
The resulting time series and their bases are returned. The base of the
resulting time series is given by the product of the bases of each time
series in the partition.
black_box_parts(series, parts)
Examples:
Blackbox 4 time series (each of length 8) into a single time series:
xs < matrix(0, nrow = 8, ncol = 4) xs[, 1] < c(0, 1, 1, 0, 1, 0, 0, 1) xs[, 2] < c(1, 0, 0, 1, 1, 0, 1, 0) xs[, 3] < c(0, 0, 0, 1, 1, 1, 0, 0) xs[, 4] < c(1, 0, 1, 0, 1, 1, 1, 0) parts < c(1, 1, 1, 1) black_box_parts(xs, parts)
This could have been more simple using black_box
, but it is for
illustrative purpose.
Blackbox 4 time series (of length 8) into two time series using the partitioning scheme $(1,2,2,1)$. That is, combine the 1st and 4th, and the 2nd and 3rd.
parts < c(1, 2, 2, 1) x < black_box_parts(xs, parts) x
Note that the two time series each have a base of 4, and the bases are
returned as $b
.
Finally, in this example, we compute the multivariate mutual information between 4 time series (each of length 8) for all partitionings of the system (except the very first since that only yields a single time series). That is, each partition is treated as a seperate variable and the mutual information is computed between the partitioned variables.
xs < matrix(0, nrow = 8, ncol = 4) xs[, 1] < c(0, 1, 1, 0, 1, 0, 0, 1) xs[, 2] < c(1, 0, 0, 1, 1, 0, 1, 0) xs[, 3] < c(0, 0, 0, 1, 1, 1, 0, 0) xs[, 4] < c(1, 0, 1, 0, 1, 1, 1, 0) all_parts < partitioning(4) mmi < rep(0, dim(all_parts)[2]  1) for (i in 2:dim(all_parts)[2]) { x < black_box_parts(xs, all_parts[, i]) mmi[i  1] < mutual_info(x$box) } round(mmi, 3)
The magic of information measures is that the actual values of a time series are irrelavent. For example, ${0, 1, 0, 1, 1 }$ has the same entropy as ${2, 9, 2, 9, 9}$ (possibly up to a rescaling). This give us the freedom to shift around the values of a time series as long as we do not change the relative number of states.
This function thus provides a way of "compressing" a time series into as small a base as possible. Why is this useful? Many of the measures use the base of the time series to determine how much memory to allocate; the larger the base, the higher the memory usage. It also affects the overall performance as the combinatorics climb exponentially with the base.
Notice that the encoding that is used ensures that the ordering of the states stays the same, e.g., $$ {8 \rightarrow 0, 2 \rightarrow 1, 2 \rightarrow 2, 4 \rightarrow 3, 6 \rightarrow 4} $$.
Interface:
Coalesce a timeseries into as few contiguous states as possible.
coalesce(series)
Examples:
The two standard usage cases for this function are to reduce the base of a time series:
coalesce(c(0, 2, 0, 2, 0, 2))
and to ensure that the states are nonnegative:
coalesce(c(8, 2, 6, 2, 4))
State encoding is a necessity when complex time series are being analyzed.
For example, k
history must be encoded as an integer in order to “observe”
it using a Dist. What if you are interested in correlating the aggragate
state of one group of nodes with that of another? You’d need to encode
the groups’ states as integers. The following functions provides such
functionality.
This function uses a bigendian
encoding scheme. That is, the most significant bits of the encoded integer
are determined by the leftmost end of the unencoded state. If b
is not provided
(or is NA
), the base is inferred from the state with a minimum value of 2.
Interface:
Encode a base'b' array of integers into a single integer.
encode(state, b = NA)
Examples:
Using the bigendian encoding scheme, the most significant bits of the encoded integer are determined by the leftmost end of the unencoded state:
encode(c(0, 0, 1), b = 2) encode(c(0, 1, 0), b = 3) encode(c(1, 0, 0), b = 4) encode(c(1, 0, 4), b = 5)
If b
is not provided or is NA
, itt is inferred from the state:
encode(c(0, 0, 2), NA) encode(c(0, 2, 0)) encode(c(1, 2, 1))
The provided encoded state is decoded using the
bigendian encoding
scheme. Note that the base b
must be provided, but the number of digits n
is
optional. If n
is provided then the decoded state will have exactly that
many elements.
Interface:
Decode an integer into a baseb
array with n
digits.
decode(encoding, b, n = NA)
Examples:
This function assumes a bigendian encoding scheme:
decode(2, b = 2, n = 2) decode(6, b = 2, n = 3) decode(6, b = 3, n = 2)
If n
is provided then the decoded state will have exactly that
many elements. However, note that if the provided n
is too small to contain a full
representation of the state, an error will be raised.
decode(2, b = 2, n = 4)
If n
is not provided, the length of the decoded state is inferred
from the encoding
to be as small as possible:
decode(1, b = 2) decode(1, b = 3) decode(3, b = 2) decode(3, b = 3) decode(3, b = 4)
Many analyses of complex systems consider partitioning the system into components or modules. One example of this is Evidence of Integration.
For generality, we represent a partitioning of $N$ items into $1 \leq M \leq N$ partitions as a sequence of integers $p_1, \ldots, p_N$ where $1 \leq p_i \leq M$ is the partition to which the $i$th item belongs.
As an example, suppose we partitioned ${X_1, X_2, X_3}$ as ${{X_1}, {X_2,X_3}}$. This would be represented as $(1, 2, 2)$ because $X_1$ belongs to the first partition and $X_2, X_3$ belong to the second partition.
We provide a function, partitioning
to facilitate the generation of all unique
partitionings of a system of $N$ elements.
Interface:
Partition a number n
of items into all possible different subsets. The number of
partitions is given by the Bell number $B_n$.
partitioning(n)
Examples:
Say we wish to partition a set containing 3 items:
partitioning(3)
You'll notice that the number of partitions, i.e, number of columns, is equal to the Bell number $B_3 = 5$.
We can compute the 9th Bell number as follows:
dim(partitioning(9))[2]
Some information measures are defined on transition probability matrices (TPM) as, for
example, Effective Information. For this reason, we provide a
function series_to_tpm
which estimates the onetimestep TPM from time series data.
Interface:
Estimate the onetimestep transition probability matrix $A$ from a time series. The element $A_{ji}$ is the probability of transitioning to state $j$ in the next time step given the system is in state $i$ (note the columnmajor convention).
series_to_tpm(series)
Examples:
Estimate a TPM from one time series:
series < c(0, 2, 1, 0, 1, 2, 0, 1, 2, 1, 0, 0, 2, 1, 1) series_to_tpm(series)
Estimate a TPM from multiple time series, in this case 7 2steps series:
series < matrix(0, nrow = 2, ncol = 7) series[, 1] < c(0, 0) series[, 2] < c(0, 1) series[, 3] < c(0, 1) series[, 4] < c(0, 1) series[, 5] < c(1, 0) series[, 6] < c(1, 1) series[, 7] < c(2, 2) series_to_tpm(series)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.