Description Usage Arguments Details Author(s) Examples
Multiple Imputation Based Method with two step variable selection
1 2 3 4 | methy.mice(Y, pheno, missing.index,
reference.index, missing.cov.name, complete.cov.name,
max.refernce.methy = 30, m = 30, maxit = 5,
defaultMethod = c("norm", "logreg", "polyreg", "polr"))
|
Y |
Methylation Data Set |
pheno |
A data frame or a matrix containing the incomplete data. Missing
values are coded as |
missing.index |
Indicators of the subjects who have the missing covariates |
reference.index |
Indicators of the subjects who have the complete covariates |
missing.cov.name |
The missing covariate name |
complete.cov.name |
The complete covariate name |
max.refernce.methy |
The maximum CpG sites will be used to impute the missing covariates |
m |
Number of multiple imputations. The default is |
maxit |
A scalar giving the number of iterations. The default is 5. |
defaultMethod |
A vector of three strings containing the default
imputation methods for numerical columns, factor columns with 2 levels, and
columns with (unordered or ordered) factors with more than two levels,
respectively. If nothing is specified, the following defaults will be used:
|
Our proposed method combines multiple imputation and variable selection in high-dimensional data. To deal with high-dimensional methylation data, we use a two-step variable selection approach including a screen and a selection stage. Then standard multiple imputation approach is applied to impute the missing covariate values and account for the uncertainty of imputation.
Generates multiple imputations for incomplete multivariate data by Gibbs sampling. Missing data can occur anywhere in the data. The algorithm imputes an incomplete column (the target column) by generating 'plausible' synthetic values given other columns in the data. Each incomplete column must act as a target column, and has its own specific set of predictors. The default set of predictors for a given target consists of all other columns in the data. For predictors that are incomplete themselves, the most recently generated imputations are used to complete the predictors prior to imputation of the target column.
A separate univariate imputation model can be specified for each column. The default imputation method depends on the measurement level of the target column. In addition to these, several other methods are provided. You can also write their own imputation functions, and call these from within the algorithm.
The data may contain categorical variables that are used in a regressions on
other variables. The algorithm creates dummy variables for the categories of
these variables, and imputes these from the corresponding categorical
variable. The extended model containing the dummy variables is called the
padded model. Its structure is stored in the list component pad
.
Built-in elementary imputation methods are:
Predictive mean matching (any)
Bayesian linear regression (numeric)
Linear regression ignoring model error (numeric)
Linear regression using bootstrap (numeric)
Linear regression, predicted values (numeric)
Unconditional mean imputation (numeric)
Two-level normal imputation (numeric)
Two-level normal imputation using pan (numeric)
Imputation at level-2 of the class mean (numeric)
Imputation at level-2 by Bayesian linear regression (numeric)
Imputation at level-2 by Predictive mean matching (any)
Imputation of quadratic terms (numeric)
Logistic regression (factor, 2 levels)
Logistic regression with bootstrap
Polytomous logistic regression (factor, >= 2 levels)
Proportional odds model (ordered, >=2 levels)
Linear discriminant analysis (factor, >= 2 categories)
Classification and regression trees (any)
Random forest imputations (any)
Random indicator method for nonignorable data (numeric)
Random sample from the observed values (any)
Experimental: Fast predictive mean matching using C++ (any)
Chong Wu
1 | ### Do Not Run
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.