miceRanger: miceRanger: Fast Imputation with Random Forests
In miceRanger: Multiple Imputation by Chained Equations with Random Forests

Description Usage Arguments Value Vignettes Examples

Performs multiple imputation by chained random forests. Returns a miceDefs object, which contains information about the imputation process.

miceRanger(
  data,
  m = 5,
  maxiter = 5,
  vars,
  valueSelector = c("meanMatch", "value"),
  meanMatchCandidates = pmax(round(nrow(data) * 0.01), 5),
  returnModels = FALSE,
  parallel = FALSE,
  verbose = TRUE,
  ...
)

`data`	A data.frame or data.table to be imputed.
`m`	The number of datasets to produce.
`maxiter`	The number of iterations to run for each dataset.
`vars`	Specifies which and how variables should be imputed. Can be specified in 3 different ways: <missing> If not provided, all columns will be imputed using all columns. If a column contains no missing values, it will still be used as a feature to impute missing columns. <character vector> If a character vector of column names is passed, these columns will be imputed using all available columns in the dataset. The order of this vector will determine the order in which the variables are imputed. <named list of character vectors> Predictors can be specified for each variable with a named list. List names are the variables to impute. Elements in the vectors should be features used to impute that variable. The order of this list will determine the order in which the variables are imputed.
`valueSelector`	How to select the value to be imputed from the model predictions. Can be "meanMatching", "value", or a named vector containing a mixture of those values. If a named vector is passed, the names must equal the variables to be imputed specified in `vars`.
`meanMatchCandidates`	Specifies the number of candidate values which are selected from in the mean matching algorithm. Can be either specified as an integer or a named integer vector for different values by variable. If a named integer vector is passed, the names of the vector must contain at a minimum the names of the numeric variables imputed using `valueSelector = "meanMatch"`.
`returnModels`	Logical. Should the final model for each variable be returned? Set to `TRUE` to use the `impute` function, which allows imputing new samples without having to run `miceRanger` again. Setting to TRUE can cause the returned `miceDefs` object to take up a lot of memory. Use only if you plan on using the `impute` function.
`parallel`	Should the process run in parallel? Usually not necessary. This process will take advantage of any cluster set up when `miceRanger` is called.
`verbose`	should progress be printed?
`...`	other parameters passed to `ranger()` to control forest growth.

a miceDefs object, containing the following:

`callParams`	The parameters of the object.
`data`	The original data provided by the user, cast to a data.table.
`naWhere`	Logical index of missing data, having the same dimensions as `data`.
`missingCounts`	The number of missing values for each variable
`rawClasses`	The original classes provided in `data`
`newClasses`	The new classes of the returned data.
`allImps`	The imputations of all variables at each iteration, for each dataset.
`allImport`	The variable importance metrics at each iteration, for each dataset.
`allError`	The OOB model error for all variables at each iteration, for each dataset.
`finalImps`	The final imputations for each dataset.
`finalImport`	The final variable importance metrics for each dataset.
`finalError`	The final model error for each variable in every dataset.
`finalModels`	Only returned if `returnModels = TRUE`. A list of `ranger` random forests for each dataset/variable.
`imputationTime`	The total time in seconds taken to create the imputations for the specified datasets and iterations. Does not include any setup time.

It is highly recommended to visit the GitHub README for a thorough walkthrough of miceRanger's capabilities, as well as performance benchmarks.

Several vignettes are also available on miceRanger's listing on the CRAN website.

#################
## Simple Example

data(iris)
ampIris <- amputeData(iris)

miceObj <- miceRanger(
    ampIris
  , m = 1
  , maxiter = 1
  , verbose=FALSE
  , num.threads = 1
  , num.trees=5
)


##################
## Run in parallel

data(iris)
ampIris <- amputeData(iris)

library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)

# Perform mice 
miceObjPar <- miceRanger(
    ampIris
  , m = 2
  , maxiter = 2
  , parallel = TRUE
  , verbose = FALSE
)
stopCluster(cl)
registerDoSEQ()


############################
## Complex Imputation Schema

data(iris)
ampIris <- amputeData(iris)

# Define variables to impute, as well as their predictors
v <- list(
  Sepal.Width = c("Sepal.Length","Petal.Width","Species")
  , Sepal.Length = c("Sepal.Width","Petal.Width")
  , Species = c("Sepal.Width")
)

# Specify mean matching for certain variables.
vs <- c(
  Sepal.Width = "meanMatch"
  , Sepal.Length = "value"
  , Species = "meanMatch"
)

# Different mean matching candidates per variable.
mmc <- c(
  Sepal.Width = 4
  , Species = 10
)

miceObjCustom <- miceRanger(
    ampIris
  , m = 1
  , maxiter = 1
  , vars = v
  , valueSelector = vs
  , meanMatchCandidates = mmc
  , verbose=FALSE
)