forager: Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

Description Usage Arguments Details Value See Also Examples

In the unsupervised case, tree ensemble built on the imputed data (of the previous iteration) in an unsupervised way and used to impute data until a stopping criteria is reached. In the supervised case, forest is grown in a supervised way (a response is used) to impute for every iteration. See 'details'.

1
2
3

forest_impute(dataset, responseVarName, method = "synthetic",
  predictMethod = "terminalNodes", implementation = "ranger",
  tol = 0.05, maxIter = 10L, seed = 1L, nproc = 1L, ...)

`dataset`	A list with two components: First item (datasetComplete) should be a dataframe without missing values. Second item (datasetMissingBoolean) should be a dataframe with TRUE at the position where data is missing, FALSE otherwise. The dimension and column names should be identical to datasetComplete.
`responseVarName`	(string) Name of the response variable (supervised case)
`method`	(string) A method to build the tree ensemble when object is missing. Currently, only "synthetic" is implemented.
`predictMethod`	(string) Method to to compute the proximity matrix. Currently, only "terminalNodes" is implemented.
`implementation`	(string) One among: 'ranger', 'randomForest'
`tol`	(number between 0 and 1) Threshold for the change of the metric. See 'details'.
`maxIter`	(positive integer) Maximum number of iterations.
`seed`	(positive integer) seed for growing a forest.
`nproc`	(positive integer) Number of parallel processes to be used
`...`	Arguments to be passed to synthetic_forest in the unsupervised case.

In the unsupervised case, when "synthetic" method is chosen, a random forest is grown using 'datasetComplete' to separate actual data from synthetic data. When the predictMethod is "terminalNodes", the proximity matrix is computed. In the supervised case, forest is grown with a specified response.
The missing data in each covariate is imputed by averaging non-missing values of the covariate where the weights are the proximities. This is the new 'datasetComplete'.
This is repeated until maximum number of iterations specified by "maxiter" unless for consecutive iterations the change in the metric (MAPE for continuous data, Proportion of disagreements for factors) for each covariate is less than a threshold ("tol").

A list with these elements:

data: The imputed dataset.
iter: Number of iterations.
errors: A vector of metric of the last iteration corresponding to each covariate.

rfImpute

## Not run: 
# example of unsupervised imputation

library("magrittr")

# create 20% artificial missings values at random
iris_with_na  <- missRanger::generateNA(iris, 0.2, seed = 1)
# impute with mean/mode
iris_complete <- randomForest::na.roughfix(iris_with_na)
# dataframe of missing positions
iris_missing  <- is.na(iris_with_na) %>% as.data.frame()

imp1        <- forest_impute(list(iris_complete, iris_missing)
                             , implementation = "ranger"
                             )

imp1        <- forest_impute(list(iris_complete, iris_missing)
                             , implementation = "randomForest"
                             )

imp1$iter # number of iterations
imp1$errors # errors of the last iteration

metric_relative <- function(x, y, z){

  if(sum(z) == 0){
    return(0)
  }

  if(is.numeric(x)){
    mean(abs((y[z] - x[z])/y[z]))
  } else {
    sum(x[z] != y[z])/sum(z)
  }

}

compare_roughimpute_with_actual <-
  Map(metric_relative, iris_complete, iris, iris_missing) %>%
    unlist()
compare_forest_impute_with_actual <-
  Map(metric_relative, imp1$data, iris, iris_missing) %>%
    unlist()

perf <- data.frame(
  colnames = names(compare_forest_impute_with_actual)
  , rough  = round(compare_roughimpute_with_actual, 2)
  , forest = round(compare_forest_impute_with_actual, 2)
  )
rownames(perf) <- NULL
perf

# example of supervised imputation

# create data for supervised case
iris_complete2         <- iris_complete
iris_complete2$Species <- iris$Species

iris_missing2 <- iris_missing
iris_missing2$Species <- rep(FALSE, length(iris_missing))

imp2        <- forest_impute(list(iris_complete2, iris_missing2)
                             , "Species"
                             , implementation = "ranger"
                             )


imp2        <- forest_impute(list(iris_complete2, iris_missing2)
                             , "Species"
                             , implementation = "randomForest"
                             )

compare_forest_impute_sup_with_actual <-
  Map(metric_relative, imp2$data, iris, iris_missing2) %>% unlist()

perf2 <- data.frame(
  colnames     = names(compare_forest_impute_sup_with_actual)
  , rough      = round(compare_roughimpute_with_actual, 2)
  , forest_sup = round(compare_forest_impute_sup_with_actual, 2)
  )
rownames(perf2) <- NULL
perf2
cbind(perf, forest_sup = perf2[,3])

## End(Not run)

talegari/forager documentation built on May 3, 2019, 4:01 p.m.

talegari/forager index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

talegari/forager
Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

forest_impute: Impute using a tree ensemble in un/supervised setting
In talegari/forager: Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to forest_impute in talegari/forager...

R Package Documentation

Browse R Packages

We want your feedback!

talegari/forager Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

forest_impute: Impute using a tree ensemble in un/supervised setting In talegari/forager: Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to forest_impute in talegari/forager...

R Package Documentation

Browse R Packages

We want your feedback!

talegari/forager
Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data

forest_impute: Impute using a tree ensemble in un/supervised setting
In talegari/forager: Compute auxiliary information (proximity, dissimilarity, outlyingness, depth) and imputation from tree ensembles on new data