Description Usage Arguments Details Value Author(s) References See Also Examples
This function imputes quantitative missing data by using Nearest Neighbour Imputation (NNI) with the Mahalanobis distance in a forward and sequential step-by-step process that starts from the complete part of data.
1 2 | ForImp.Mahala(mat, probs=seq(0, 1, 0.1), q="10%", add.unit=TRUE, squared=FALSE,
tol=1e-6)
|
mat |
a quantitative data matrix with missing entries. |
probs |
vector of probabilities with values in [0, 1] for computing quantiles of Mahalanobis distances in selection of donors. Default option: |
q |
string of the form |
add.unit |
a logical value. If |
squared |
a logical value indicating if the Mahalanobis distance has to be used ( |
tol |
tolerance factor introduced to prevent numerical problems occuring when distances of complete units are equal to the choosen quantile |
ForImp.Mahala
is a forward imputation method alternative to the ForImp.PCA
procedure for imputing quantitative missing data (see ForImp.PCA
). It does not embrace Stage 1 since it works directly on the original variables. Regarding Stage 2, the basic metric for the NNI method is the Mahalanobis distance. Steps 2 to 3 are therefore iteratively repeated until the starting data matrix is completely imputed.
Unlike ForImp.PCA
, the ForImp.Mahala
procedure requires that the number n of units is equal or greater than the number p of variables at every step of the procedure, otherwise the covariance matrix involved in the Mahalanobis distance is not invertible.
For further details, see the references below.
The imputed data matrix.
Nadia Solaro, Alessandro Barbiero, Giancarlo Manzi, Pier Alda Ferrari
Solaro, N., Barbiero, A., Manzi. G., Ferrari, P.A. (2014). Algorithmic-type imputation techniques with different data structures: Alternative approaches in comparison. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds), Analysis and modeling of complex data in behavioural and social sciences, Studies in Classification, Data Analysis, and Knowledge Organization. Springer International Publishing, Cham (CH): 253-261
Solaro, N., Barbiero, A., Manzi, G., Ferrari, P.A. (2015) A sequential distance-based approach for imputing missing data: The Forward Imputation. Under review
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | # EXAMPLE with multivariate normal data (MVN)
# require('mvtnorm')
# number of variables
p <- 5
# correlation matrix
rho <- 0.8
Rho <- matrix(rho, p, p)
diag(Rho) <- 1
Rho
# mean vector
vmean <- rep(0,p)
vmean
# number of units
n <- 1000
# percentage of missing values
percmiss <- 0.2
nummiss <- n*p*percmiss
nummiss
# generation of a complete matrix
set.seed(1)
x0 <- rmvnorm(n, mean=vmean, sigma=Rho)
x0
# generating a matrix with missing data
x <- missing.gen(x0, nummiss)
# imputing missing values
xForImpMahala <- ForImp.Mahala(x)
xForImpMahala
# computing the Relative Mean Square Error
error <- sum(apply((x0-xForImpMahala)^2/diag(var(x0)),2,sum)) / n
error
# EXAMPLE with real data
data(airquality)
m0 <- airquality
m0
# selecting the first 4 columns, with quantitative data
m <- m0[, 1:4]
m
# imputation
mi <- ForImp.Mahala(m)
mi
# plot of imputed values for variable "Ozone"
ozone.miss.ind <- which(is.na(m)[,1])
plot(mi[ozone.miss.ind,1], axes=FALSE, pch=19, ylab="imputed values of Ozone",
xlab="observation index")
axis(2)
axis(1, at=1:length(ozone.miss.ind), labels=ozone.miss.ind, las=2)
box()
abline(v=1:length(ozone.miss.ind), lty=3, col="grey")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.