CoImp: Copula-Based Imputation Method

Description Usage Arguments Details Value Author(s) References Examples

View source: R/CoImp.R

Description

Imputation method based on conditional copula functions.

Usage

1
2
3
4
5
CoImp(X, n.marg = ncol(X), x.up = NULL, x.lo = NULL, q.up = NULL, 
    q.lo = NULL, type.data = "continuous", smoothing = rep(0.5,n.marg), 
    plot = TRUE,  model = list(normalCopula(0.5, dim=n.marg, dispstr="ex"), 
    claytonCopula(10, dim=n.marg),gumbelCopula(10, dim=n.marg),
    frankCopula(10, dim=n.marg)), ...)

Arguments

X

a data matrix with missing values. Missing values should be denoted with NA.

n.marg

the number of variables in X.

x.up

a vector of length n.marg with the upper value of each margin used in the Hit or Miss method.

x.lo

a vector of length n.marg with the lower value of each margin used in the Hit or Miss method.

q.up

a vector of length n.marg with the probability of the quantile function used to define x.up for each margin.

q.lo

a vector of length n.marg with the probability of the quantile function used to define x.lo for each margin.

type.data

the nature of the variables in X: discrete or continuous.

smoothing

values for the nearest neighbour component of the smoothing parameter of the lp function.

plot

logical: if TRUE plots the estimated marginal densities and a bar plot of the percentages of missing and available data for each margin.

model

a list of copula models to be used for the imputation, see the Details section. This should be one of normal, frank, clayton and gumbel.

...

further parameters for fitCopula, lp and further graphical arguments.

Details

CoImp is an imputation method based on conditional copula functions that allows to impute missing observations according to the multivariate dependence structure of the generating process without any assumptions on the margins. This method can be used independently from the dimension and the kind (monotone or non monotone) of the missing patterns.

Brief description of the approach:

  1. estimate both the margins and the copula model on available data by means of the semi-parametric sequential two-step inference for margins;

  2. derive conditional density functions of the missing variables given non-missing ones through the corresponding conditional copulas obtained by using the Bayes' rule;

  3. impute missing values by drawing observations from the conditional density functions derived at the previous step. The Monte Carlo method used is the Hit or Miss.

The estimation approach for the copula fit is semiparametric: a range of nonparametric margins and parametric copula models can be selected by the user.

Value

An object of S4 class "CoImp", which is a list with the following elements:

Missing.data.matrix

the original missing data matrix to be imputed.

Perc.miss

the matrix of the percentage of missing and available data.

Estimated.Model

the estimated copula model on the available data.

Estimation.Method

the estimation method used for the copula Estimated.Model.

Index.matrix.NA

matrix indices of the missing data.

Smooth.param

the smoothing parameter alpha selected on the basis of the AIC.

Imputed.data.matrix

the imputed data matrix.

Estimated.Model.Imp

the estimated copula model on the imputed data matrix.

Estimation.Method.Imp

the estimation method used for the copula Estimated.Model.Imp.

Author(s)

Francesca Marta Lilja Di Lascio <[email protected]>,

Simone Giannerini <[email protected]>

References

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "A multivariate technique based on conditional copula specification for the imputation of complex dependent data". Working paper.

Di Lascio, F.M.L., Giannerini, S. and Reale, A. (2015) "Exploring Copulas for the Imputation of Complex Dependent Data". Statistical Methods & Applications, 24(1), p. 159-175. DOI 10.1007/s10260-014-0287-2.

Di Lascio, F.M.L., Giannerini, S. and Reale, A. (2014) "Imputation of complex dependent data by conditional copulas: analytic versus semiparametric approach", Book of proceedings of the 21st International Conference on Computational Statistics (COMPSTAT 2014), p. 491-497. ISBN 9782839913478.

Bianchi, G. Di Lascio, F.M.L. Giannerini, S. Manzari, A. Reale, A. and Ruocco, G. (2009) "Exploring copulas for the imputation of missing nonlinearly dependent data". Proceedings of the VII Meeting Classification and Data Analysis Group of the Italian Statistical Society (Cladag), Editors: Salvatore Ingrassia and Roberto Rocci, Cleup, p. 429-432. ISBN: 978-88-6129-406-6.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
## generate data from a 4-variate Frank copula with different margins

set.seed(11)
n.marg <- 4
theta  <- 5
copula <- frankCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
n      <- 20
x.samp <- copula::rMvdc(n, mymvdc)

# randomly introduce univariate and multivariate missing

perc.mis    <- 0.5
set.seed(11)
miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
miss        <- cbind(miss.row,miss.col)
x.samp.miss <- replace(x.samp,miss,NA)

# impute missing values

imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), plot=TRUE,
       type.data="continuous");

# methods show and plot

show(imp)
plot(imp)

## generate data from a 3-variate Clayton copula with different bounded margins

set.seed(11)
n.marg <- 3
theta  <- 5
copula <- claytonCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("beta", "beta", "beta"), list(list(shape1=4, shape2=1),
            list(shape1=.5, shape2=.5), list(shape1=2, shape2=3)))
n      <- 100
x.samp <- copula::rMvdc(n, mymvdc)

# randomly introduce univariate and multivariate missing

perc.mis    <- 0.2
set.seed(11)
miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
miss        <- cbind(miss.row,miss.col)
x.samp.miss <- replace(x.samp,miss,NA)

# impute missing values

imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = c(0.45,0.2,0.5), plot=TRUE,
        q.lo=rep(0.1,n.marg), q.up=rep(0.9,n.marg));

# methods show and plot

show(imp)
plot(imp)

CoImp documentation built on May 29, 2017, 6:38 p.m.