# CoImp: Copula-Based Imputation Method In CoImp: Copula Based Imputation Method

## Description

Imputation method based on conditional copula functions.

## Usage

 ```1 2 3 4 5``` ```CoImp(X, n.marg = ncol(X), x.up = NULL, x.lo = NULL, q.up = NULL, q.lo = NULL, type.data = "continuous", smoothing = rep(0.5,n.marg), plot = TRUE, model = list(normalCopula(0.5, dim=n.marg, dispstr="ex"), claytonCopula(10, dim=n.marg),gumbelCopula(10, dim=n.marg), frankCopula(10, dim=n.marg)), ...) ```

## Arguments

 `X` a data matrix with missing values. Missing values should be denoted with `NA`. `n.marg` the number of variables in X. `x.up` a vector of length n.marg with the upper value of each margin used in the Hit or Miss method. `x.lo` a vector of length n.marg with the lower value of each margin used in the Hit or Miss method. `q.up` a vector of length n.marg with the probability of the quantile function used to define x.up for each margin. `q.lo` a vector of length n.marg with the probability of the quantile function used to define x.lo for each margin. `type.data` the nature of the variables in X: `discrete` or `continuous`. `smoothing` values for the nearest neighbour component of the smoothing parameter of the `lp` function. `plot` logical: if `TRUE` plots the estimated marginal densities and a bar plot of the percentages of missing and available data for each margin. `model` a list of copula models to be used for the imputation, see the Details section. This should be one of `normal`, `frank`, `clayton` and `gumbel`. `...` further parameters for `fitCopula`, `lp` and further graphical arguments.

## Details

CoImp is an imputation method based on conditional copula functions that allows to impute missing observations according to the multivariate dependence structure of the generating process without any assumptions on the margins. This method can be used independently from the dimension and the kind (monotone or non monotone) of the missing patterns.

Brief description of the approach:

1. estimate both the margins and the copula model on available data by means of the semi-parametric sequential two-step inference for margins;

2. derive conditional density functions of the missing variables given non-missing ones through the corresponding conditional copulas obtained by using the Bayes' rule;

3. impute missing values by drawing observations from the conditional density functions derived at the previous step. The Monte Carlo method used is the Hit or Miss.

The estimation approach for the copula fit is semiparametric: a range of nonparametric margins and parametric copula models can be selected by the user.

## Value

An object of S4 class "CoImp", which is a list with the following elements:

 `Missing.data.matrix` the original missing data matrix to be imputed. `Perc.miss` the matrix of the percentage of missing and available data. `Estimated.Model` the estimated copula model on the available data. `Estimation.Method` the estimation method used for the copula `Estimated.Model`. `Index.matrix.NA` matrix indices of the missing data. `Smooth.param` the smoothing parameter alpha selected on the basis of the AIC. `Imputed.data.matrix` the imputed data matrix. `Estimated.Model.Imp` the estimated copula model on the imputed data matrix. `Estimation.Method.Imp` the estimation method used for the copula `Estimated.Model.Imp`.

## Author(s)

Francesca Marta Lilja Di Lascio <[email protected]>,

Simone Giannerini <[email protected]>

## References

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "A multivariate technique based on conditional copula specification for the imputation of complex dependent data". Working paper.

Di Lascio, F.M.L., Giannerini, S. and Reale, A. (2015) "Exploring Copulas for the Imputation of Complex Dependent Data". Statistical Methods & Applications, 24(1), p. 159-175. DOI 10.1007/s10260-014-0287-2.

Di Lascio, F.M.L., Giannerini, S. and Reale, A. (2014) "Imputation of complex dependent data by conditional copulas: analytic versus semiparametric approach", Book of proceedings of the 21st International Conference on Computational Statistics (COMPSTAT 2014), p. 491-497. ISBN 9782839913478.

Bianchi, G. Di Lascio, F.M.L. Giannerini, S. Manzari, A. Reale, A. and Ruocco, G. (2009) "Exploring copulas for the imputation of missing nonlinearly dependent data". Proceedings of the VII Meeting Classification and Data Analysis Group of the Italian Statistical Society (Cladag), Editors: Salvatore Ingrassia and Roberto Rocci, Cleup, p. 429-432. ISBN: 978-88-6129-406-6.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59``` ```## generate data from a 4-variate Frank copula with different margins set.seed(11) n.marg <- 4 theta <- 5 copula <- frankCopula(theta, dim = n.marg) mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2), list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3))) n <- 20 x.samp <- copula::rMvdc(n, mymvdc) # randomly introduce univariate and multivariate missing perc.mis <- 0.5 set.seed(11) miss.row <- sample(1:n, perc.mis*n, replace=TRUE) miss.col <- sample(1:n.marg, perc.mis*n, replace=TRUE) miss <- cbind(miss.row,miss.col) x.samp.miss <- replace(x.samp,miss,NA) # impute missing values imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), plot=TRUE, type.data="continuous"); # methods show and plot show(imp) plot(imp) ## generate data from a 3-variate Clayton copula with different bounded margins set.seed(11) n.marg <- 3 theta <- 5 copula <- claytonCopula(theta, dim = n.marg) mymvdc <- mvdc(copula, c("beta", "beta", "beta"), list(list(shape1=4, shape2=1), list(shape1=.5, shape2=.5), list(shape1=2, shape2=3))) n <- 100 x.samp <- copula::rMvdc(n, mymvdc) # randomly introduce univariate and multivariate missing perc.mis <- 0.2 set.seed(11) miss.row <- sample(1:n, perc.mis*n, replace=TRUE) miss.col <- sample(1:n.marg, perc.mis*n, replace=TRUE) miss <- cbind(miss.row,miss.col) x.samp.miss <- replace(x.samp,miss,NA) # impute missing values imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = c(0.45,0.2,0.5), plot=TRUE, q.lo=rep(0.1,n.marg), q.up=rep(0.9,n.marg)); # methods show and plot show(imp) plot(imp) ```

CoImp documentation built on May 29, 2017, 6:38 p.m.