This vignette shows how to model multivariate distributions with
univariateML and the
A copula is a function describing the dependency among a set of one-dimensional distributions. If both the marginal distributions and the copula is known, the entire multivariate distribution is known too.
Suppose we look at a multivariate distribution as the pair of a copula and its marginals. Then a natural model selection method is to
This two-step procedure is commonly used due to its simplicity. The procedure must be carried out this order since the marginal data cannot be transformed to the unit interval unless we know the marginal distributions.
univariateML can be used for task 1, while the
copula package can be used to do task 2.
abalone data set is included in this package. It consists of $9$ physical
measurements of $4177$ sea snails.
Following @ko2019focused we will take a look at four measurements of the abalones,
age. The variable
age is not
present in the
abalone data, but is defined as
age = rings + 1.5. Moreover,
there are two outliers in the
height data at at $1.13$ and $0.52$. We will
remove these outliers and all columns we don't need in the following.
data = dplyr::filter(abalone, height < 0.5) data$age = data$rings + 1.5 data = data[c("diameter", "height", "shell_weight", "age")] hist(data$height, main = "Abalone height", xlab = "Height in mm")
Let's continue doing step 1. First we must decide on a set of models to try out.
models = c("gumbel", "laplace", "logis", "norm", "exp", "gamma", "invgamma", "invgauss", "invweibull", "llogis", "lnorm", "rayleigh", "weibull", "lgamma", "pareto", "beta", "kumar", "logitnorm") length(models)
Optionally, we can use all implemented models with
The next step is to fit all models, compute the AIC, and select the best model.
This is exactly what
margin_fits <- lapply(data, model_select, models = models, criterion = "aic")
Now we use the
fitCopula from the package
copula on the transformed margins of
We will examine two elliptical copulas and three Archimedean copulas. The elliptical copulas are the Gaussian copula and the t-copula, while the Archimedean copulas are the Joe copula, the Clayton copula, and the Gumbel copula.
# Transform the marginals to the unit interval. y = sapply(seq_along(data), function(j) pml(data[[j]], margin_fits[[j]])) # The copulas described above. copulas = list(normal = copula::normalCopula(dim = 4, dispstr = "un"), t = copula::tCopula(dim = 4, dispstr = "un"), joe = copula::joeCopula(dim = 4), clayton = copula::claytonCopula(dim = 4), gumbel = copula::gumbelCopula(dim = 4)) fits = sapply(copulas, function(x) copula::fitCopula(x, data = y, method = "mpl")) sapply(fits, AIC)
The t-copula is the clear winner of the AIC competition. The Archimedean copulas perform particularly poorly.
Hence our final model is the t-copula with Kumaraswamy (
mlkumar) marginal distribution for
diameter, normal marginal distribution for
height, Weibull marginal distribution for
and log normal marginal distribution for
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.