Impute: Parametric and Non-Parameric Imputation

View source: R/Impute.R

ImputeR Documentation

Parametric and Non-Parameric Imputation

Description

This function imputes the data using two methods.

method 'Normal' - Imputes the data assuming that the data come from a multivariate normal distribution with mean mu and covariance sig. If mu or sig are not inputted, then their maximum likelihood estimate is used. The imputed values are based on the conditional distribution of the missing given the observed and mu and sigma; see Jamshidian and Jalal (2010) for more details.

method 'Dist.Free' - This method imputes the data nonparametrically using the method of Sirvastava and Dolatabadi (2009). Also see Jamshidian and Jalal (2010).

Usage

Impute(data, mu = NA, sig = NA, imputation.method = "Normal", resid = NA)

Arguments

data

A matrix consisting of at least two columns. Values must be numerical with missing data indicated by NA.

mu

A vector, consisting of population means, used to impute the data. As a default the maximum likelihood estimates based on the observed data is used.

sig

The population covariance matrix used to impute the data. As a default the maximum likelihood estimates based on the observed data is used.

imputation.method

'Normal' uses the normal imputation method. 'Dist.free uses the the method. See Jamshidian and Jalal (2010) and Sirvastava and Dolatabadi (2009).

resid

User defined residual vector to be used in place of the residuals proposed by the Sirvastava and Dolatabadi (2009) method.

Details

This routine uses OrderMissing to order data accordinng to missing data patterns. The output consists of imputed data both in its original order as well as post ordering by OrderMissing.

Value

yimp

The imputed data set (in the order of the original data) after rwos with no datum (if any) have been deleted.

yimpOrdered

The imputed data set ordered by OrderMissing according to missing data pattern

caseorder

A mapping of case number indices from OrderedData to the original data. More specifically, the j-th row of the OrderedData is the caseorder[j]-th (the j-th element of caseorder) row of the original data.

patused

A matrix indicating the missing data patterns in the data set, using 1's' (for observed) and NA's (for missing).

patcnt

A vector consisting the number of cases corresponding to each pattern in patused.

Note

In the above descriptions "original data" refers to the input data after deletion of the rows consisting of all NA's (if any) .

Author(s)

Mortaza Jamshidian, Siavash Jalal, and Camden Jansen

References

Srivastava, M. S. and Dolatabadi, M. (2009). “Multiple imputation and other resampling scheme for imputing missing observations,” Journal of Multivariate Analysis, 100, 1919-1937, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2009.06.003")}.

Jamshidian, M. and Jalal, S. (2010). “Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,” Psychometrika, 75, 649-674, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11336-010-9175-3")}.

Examples

set.seed <- 50
n <- 200
p <- 4
pctmiss <- 0.2
y <- matrix(rnorm(n * p),nrow = n)
missing <- matrix(runif(n * p), nrow = n) < pctmiss
y[missing] <- NA

yimp1 <- Impute(data=y, mu = NA, sig = NA, imputation.method = "Normal", resid = NA)
yimp2 <- Impute(data=y, mu = NA, sig = NA, imputation.method = "Dist.Free", resid = NA)


MissMech documentation built on May 29, 2024, 11:57 a.m.