Home

/

GitHub

/

ChongWu-Biostat/MethyImpute

/

methy.mice: Multiple Imputation Based Method

methy.mice: Multiple Imputation Based Method
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

Description Usage Arguments Details Author(s) Examples

View source: R/methy_mice.R

Multiple Imputation Based Method with two step variable selection

methy.mice(Y, pheno, missing.index, 
reference.index, missing.cov.name, complete.cov.name,
 max.refernce.methy = 30, m = 30, maxit = 5, 
 defaultMethod = c("norm", "logreg", "polyreg", "polr"))

`Y`	Methylation Data Set
`pheno`	A data frame or a matrix containing the incomplete data. Missing values are coded as `NA`.
`missing.index`	Indicators of the subjects who have the missing covariates
`reference.index`	Indicators of the subjects who have the complete covariates
`missing.cov.name`	The missing covariate name
`complete.cov.name`	The complete covariate name
`max.refernce.methy`	The maximum CpG sites will be used to impute the missing covariates
`m`	Number of multiple imputations. The default is `m=5`.
`maxit`	A scalar giving the number of iterations. The default is 5.
`defaultMethod`	A vector of three strings containing the default imputation methods for numerical columns, factor columns with 2 levels, and columns with (unordered or ordered) factors with more than two levels, respectively. If nothing is specified, the following defaults will be used: `pmm`, predictive mean matching (numeric data) `logreg`, logistic regression imputation (binary data, factor with 2 levels) `polyreg`, polytomous regression imputation for unordered categorical data (factor >= 2 levels) `polr`, proportional odds model for (ordered, >= 2 levels)

Our proposed method combines multiple imputation and variable selection in high-dimensional data. To deal with high-dimensional methylation data, we use a two-step variable selection approach including a screen and a selection stage. Then standard multiple imputation approach is applied to impute the missing covariate values and account for the uncertainty of imputation.

Generates multiple imputations for incomplete multivariate data by Gibbs sampling. Missing data can occur anywhere in the data. The algorithm imputes an incomplete column (the target column) by generating 'plausible' synthetic values given other columns in the data. Each incomplete column must act as a target column, and has its own specific set of predictors. The default set of predictors for a given target consists of all other columns in the data. For predictors that are incomplete themselves, the most recently generated imputations are used to complete the predictors prior to imputation of the target column.

A separate univariate imputation model can be specified for each column. The default imputation method depends on the measurement level of the target column. In addition to these, several other methods are provided. You can also write their own imputation functions, and call these from within the algorithm.

The data may contain categorical variables that are used in a regressions on other variables. The algorithm creates dummy variables for the categories of these variables, and imputes these from the corresponding categorical variable. The extended model containing the dummy variables is called the padded model. Its structure is stored in the list component pad.

Built-in elementary imputation methods are:

pmm: Predictive mean matching (any)
norm: Bayesian linear regression (numeric)
norm.nob: Linear regression ignoring model error (numeric)
norm.boot: Linear regression using bootstrap (numeric)
norm.predict: Linear regression, predicted values (numeric)
mean: Unconditional mean imputation (numeric)
2l.norm: Two-level normal imputation (numeric)
2l.pan: Two-level normal imputation using pan (numeric)
2lonly.mean: Imputation at level-2 of the class mean (numeric)
2lonly.norm: Imputation at level-2 by Bayesian linear regression (numeric)
2lonly.pmm: Imputation at level-2 by Predictive mean matching (any)
quadratic: Imputation of quadratic terms (numeric)
logreg: Logistic regression (factor, 2 levels)
logreg.boot: Logistic regression with bootstrap
polyreg: Polytomous logistic regression (factor, >= 2 levels)
polr: Proportional odds model (ordered, >=2 levels)
lda: Linear discriminant analysis (factor, >= 2 categories)
cart: Classification and regression trees (any)
rf: Random forest imputations (any)
ri: Random indicator method for nonignorable data (numeric)
sample: Random sample from the observed values (any)
fastpmm: Experimental: Fast predictive mean matching using C++ (any)

Chong Wu

1	### Do Not Run

ChongWu-Biostat/MethyImpute documentation built on May 6, 2019, 11:18 a.m.

ChongWu-Biostat/MethyImpute index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ChongWu-Biostat/MethyImpute
Imputing the missing covariates via methylation data

methy.mice: Multiple Imputation Based Method
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

Description

Usage

Arguments

Details

Author(s)

Examples

Related to methy.mice in ChongWu-Biostat/MethyImpute...

R Package Documentation

Browse R Packages

We want your feedback!

ChongWu-Biostat/MethyImpute Imputing the missing covariates via methylation data

methy.mice: Multiple Imputation Based Method In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

Description

Usage

Arguments

Details

Author(s)

Examples

Related to methy.mice in ChongWu-Biostat/MethyImpute...

R Package Documentation

Browse R Packages

We want your feedback!

ChongWu-Biostat/MethyImpute
Imputing the missing covariates via methylation data

methy.mice: Multiple Imputation Based Method
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data