miceRanger | R Documentation |
Performs multiple imputation by chained random forests. Returns a miceDefs object, which contains information about the imputation process.
miceRanger( data, m = 5, maxiter = 5, vars, valueSelector = c("meanMatch", "value"), meanMatchCandidates = pmax(round(nrow(data) * 0.01), 5), returnModels = FALSE, parallel = FALSE, verbose = TRUE, ... )
data |
A data.frame or data.table to be imputed. |
m |
The number of datasets to produce. |
maxiter |
The number of iterations to run for each dataset. |
vars |
Specifies which and how variables should be imputed. Can be specified in 3 different ways:
|
valueSelector |
How to select the value to be imputed from the model predictions.
Can be "meanMatching", "value", or a named vector containing a mixture of those values.
If a named vector is passed, the names must equal the variables to be imputed specified in |
meanMatchCandidates |
Specifies the number of candidate values which are selected from in the
mean matching algorithm. Can be either specified as an integer or a named integer vector for different
values by variable. If a named integer vector is passed, the names of the vector must contain at a
minimum the names of the numeric variables imputed using |
returnModels |
Logical. Should the final model for each variable be returned? Set to |
parallel |
Should the process run in parallel? Usually not necessary. This process will
take advantage of any cluster set up when |
verbose |
should progress be printed? |
... |
other parameters passed to |
a miceDefs object, containing the following:
callParams |
The parameters of the object. |
data |
The original data provided by the user, cast to a data.table. |
naWhere |
Logical index of missing data, having the same dimensions as |
missingCounts |
The number of missing values for each variable |
rawClasses |
The original classes provided in |
newClasses |
The new classes of the returned data. |
allImps |
The imputations of all variables at each iteration, for each dataset. |
allImport |
The variable importance metrics at each iteration, for each dataset. |
allError |
The OOB model error for all variables at each iteration, for each dataset. |
finalImps |
The final imputations for each dataset. |
finalImport |
The final variable importance metrics for each dataset. |
finalError |
The final model error for each variable in every dataset. |
finalModels |
Only returned if |
imputationTime |
The total time in seconds taken to create the imputations for the specified datasets and iterations. Does not include any setup time. |
It is highly recommended to visit the GitHub README for a thorough walkthrough of miceRanger's capabilities, as well as performance benchmarks.
Several vignettes are also available on miceRanger's listing on the CRAN website.
################# ## Simple Example data(iris) ampIris <- amputeData(iris) miceObj <- miceRanger( ampIris , m = 1 , maxiter = 1 , verbose=FALSE , num.threads = 1 , num.trees=5 ) ################## ## Run in parallel data(iris) ampIris <- amputeData(iris) library(doParallel) cl <- makeCluster(2) registerDoParallel(cl) # Perform mice miceObjPar <- miceRanger( ampIris , m = 2 , maxiter = 2 , parallel = TRUE , verbose = FALSE ) stopCluster(cl) registerDoSEQ() ############################ ## Complex Imputation Schema data(iris) ampIris <- amputeData(iris) # Define variables to impute, as well as their predictors v <- list( Sepal.Width = c("Sepal.Length","Petal.Width","Species") , Sepal.Length = c("Sepal.Width","Petal.Width") , Species = c("Sepal.Width") ) # Specify mean matching for certain variables. vs <- c( Sepal.Width = "meanMatch" , Sepal.Length = "value" , Species = "meanMatch" ) # Different mean matching candidates per variable. mmc <- c( Sepal.Width = 4 , Species = 10 ) miceObjCustom <- miceRanger( ampIris , m = 1 , maxiter = 1 , vars = v , valueSelector = vs , meanMatchCandidates = mmc , verbose=FALSE )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.