mutualinf: Mutual information Based on Multivariate Distributions... In genomaths/usefr: Utility Functions for Statistical Analyses

 mutualinf R Documentation

Mutual information Based on Multivariate Distributions Constructed from Copulas

Description

Computes the mutual information for pairwise x and y marginal values based on their multivariate distribution constructed from a copula.

Usage

``````mutualinf(
x,
y,
copula = NULL,
margins = NULL,
paramMargins = NULL,
method = "ml",
ties.method = "max"
)
``````

Arguments

 `x, y` marginal variates `copula` A copula object from class `Mvdc` or string specifying all the name for a copula from package `copula-package`. `margins` A character vector specifying all the parametric marginal distributions. See details below. `paramMargins` A list whose each component is a list (or numeric vectors) of named components, giving the parameter values of the marginal distributions. See details below. `method` A character string specifying the estimation method to be used to estimate the dependence parameter(s) (if the copula needs to be estimated) see `fitCopula`.

Details

The mutual information of a pairwise x and y marginal values is defined as:

`I{x, y} = log(P(x,y)) - (log(P_1(x)) + log(P_2(y)))`

where P(x,y) is the multivariate distribution constructed from a copula, and P_1(x) and P_2(y) are the marginal CDFs.

The values `I{x, y}` expresses a measurement of the relative dependece/independece of x and y at the specified point value.

Notice that the above definition expresses the differences between two uncertainty variations. So, for values `I{x, y} > 0`, we shall say that at point (x, y) there is a gain of information for the association of the subjacent stochastic processes generating x and y in respect to the independent processes. Otherwise, for values `I{x, y} < 0` we shall say that at point (x, y) there is a loss of information for the association of the subjacent stochastic process generating x and y in respect to the independent processes. Or, equivallently, there is a gain of information for the independent processes in respect to their association.

Value

A list with a data frame carrying the estimated mutual information for each (x, y) pair, the joint and marginal probabilities, and the "mvdc" copula object.

`ppCplot`, `bicopulaGOF`, `gofCopula`, `fitCDF`, `fitdistr`, and `fitMixDist`.

Examples

``````require(stats)
set.seed(12) # set a seed for random number generation
## Random generation of a Normal distributed marginal variate
X <- rnorm(2000, mean = 1, sd = 0.2)

## Random generation of a Weibull-3P distributed marginal variate
Y <- X + rweibull3p(2000, shape = 2, scale = 0.85, mu = 1)

## Correlation test
cor.test(X, Y, method = "spearman")

## Non-linear model fit for 'Y' distribution values
fitY <- fitCDF(Y, distNames = 12) # 3P Weibull distribution model
coefs <- coef(fitY\$bestfit) # model coefficients

## Goodness-of-fit test for the  Weibull-3P distribution model
mcgoftest(
varobj = Y, distr = "weibull3p", pars = coefs, num.sampl = 99,
sample.size = 1999, stat = "chisq", num.cores = 4, breaks = 200,
seed = 123
)

## Settngs to estimate the Mutual information
margins <- c("norm", "weibull3p")
parMargins <- list(
list(mean = 1, sd = 0.2),
as.list(coefs)
) # Notice "as.list" is used here, not "list"

## Finally estimation of the mutual information
mutual.Inf <- mutualinf(
x = X, y = Y, copula = "normalCopula",
margins = margins, paramMargins = parMargins
)