View source: R/neighborhood_imputation.R
impute_features | R Documentation |
This function uses the neighborhoods implied by a random forest to impute missing features. The neighbors of a data point are all the training points assigned to the same leaf in at least one tree in the forest. The weight of each neighbor is the fraction of trees in the forest for which it was assigned to the same leaf. We impute a missing feature for a point by computing the weighted average feature value, using neighborhood weights, using all of the point's neighbors.
impute_features(
object,
newdata,
seed = round(runif(1) * 10000),
use_mean_imputation_fallback = FALSE
)
object |
an object of class 'forestry' |
newdata |
the feature data.frame we will impute missing features for. |
seed |
a random seed passed to the predict method of forestry |
use_mean_imputation_fallback |
if TRUE, mean imputation (for numeric variables) and mode imputation (for factor variables) is used for missing features for which all neighbors also had the corresponding feature missing; if FALSE these missing features remain NAs in the data frame returned by 'impute_features'. |
A data.frame that is newdata with imputed missing values.
iris_with_missing <- iris
idx_miss_factor <- sample(nrow(iris), 25, replace = TRUE)
iris_with_missing[idx_miss_factor, 5] <- NA
idx_miss_numeric <- sample(nrow(iris), 25, replace = TRUE)
iris_with_missing[idx_miss_numeric, 3] <- NA
x <- iris_with_missing[,-1]
y <- iris_with_missing[, 1]
forest <- forestry(x, y, ntree = 500, seed = 2,nthread = 2)
imputed_x <- impute_features(forest, x, seed = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.