generate_synthetic_data: Generate synthetic data for random forest

Description Usage Arguments Value

View source: R/generate_synthetic_data.R

Description

Unsupervised learning of randomforest as suggested by Brieman (https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#unsup) involves creating synthetic data by sampling randomly from unvariate distributions of each covariate(feature). This supports two methods: First, where proportions or distribition is taken into account when sampling at random, second where the data is sampled assuming uniform distribution. The former corresponds to "Addcl1" from Horvath's paper (Unsupervised Learning With Random Forest Predictors: Tao Shi & Steve Horvath) and latter corresponds to "addc2".

Usage

1
generate_synthetic_data(dataset, prop, seed)

Arguments

dataset

A dataframe

prop

Random sampling of covariates (when prop = TRUE) to generate synthetic data. Else, uniform sampling is used.

seed

Seed for sampling.

Value

A dataframe with synthetic data.


talegari/forager documentation built on May 3, 2019, 4:01 p.m.