rrfImpute | R Documentation |
Impute missing values in predictor data using proximity from RRF.
## Default S3 method: rrfImpute(x, y, iter=5, ntree=300, ...) ## S3 method for class 'formula' rrfImpute(x, data, ..., subset)
x |
A data frame or matrix of predictors, some containing
|
y |
Response vector ( |
data |
A data frame containing the predictors and response. |
iter |
Number of iterations to run the imputation. |
ntree |
Number of trees to grow in each iteration of RRF. |
... |
Other arguments to be passed to
|
subset |
A logical vector indicating which observations to use. |
The algorithm starts by imputing NA
s using
na.roughfix
. Then RRF
is called
with the completed data. The proximity matrix from the RRF
is used to update the imputation of the NA
s. For continuous
predictors, the imputed value is the weighted average of the
non-missing obervations, where the weights are the proximities. For
categorical predictors, the imputed value is the category with the
largest average proximity. This process is iterated iter
times.
Note: Imputation has not (yet) been implemented for the unsupervised case. Also, Breiman (2003) notes that the OOB estimate of error from RRF tend to be optimistic when run on the data matrix with imputed values.
A data frame or matrix containing the completed data matrix, where
NA
s are imputed using proximity from RRF. The first
column contains the response.
Andy Liaw
Leo Breiman (2003). Manual for Setting Up, Using, and Understanding Random Forest V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
na.roughfix
.
data(iris) iris.na <- iris set.seed(111) ## artificially drop some data values. for (i in 1:4) iris.na[sample(150, 20), i] <- NA set.seed(222) iris.imputed <- rrfImpute(Species ~ ., iris.na) set.seed(333) iris.rf <- RRF(Species ~ ., iris.imputed) print(iris.rf)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.