privateRF: Private random forests algorithm

Description Usage Arguments Value See Also Examples

View source: R/classification.R

Description

Random Forest Thresholdout, which is TO with the feature selection and classifier replaced with Random Forest.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
privateRF(
  train.ds = NULL,
  holdout.ds = NULL,
  validation.ds = NULL,
  label = "class",
  is.simulated = TRUE,
  rf.importance.measure = "MeanDecreaseGini",
  rf.ntree = 500,
  rf.mtry = NULL,
  pec.file = NULL,
  update.freq = 50,
  threshold = 4/sqrt(nrow(train.ds)),
  tolerance = 1/sqrt(nrow(train.ds)),
  signal.names = NULL,
  save.file = NULL,
  verbose = FALSE
)

Arguments

train.ds

A data frame with training data and outcome labels

holdout.ds

A data frame with holdout data and outcome labels

validation.ds

A data frame with validation data and outcome labels

label

A character vector of the outcome variable column name

is.simulated

Is the data simulated (or real?)

rf.importance.measure

A character vector for the random forest importance measure

rf.ntree

An integer the number of trees in the random forest

rf.mtry

An integer the number of variables sampled at each random forest node split

pec.file

A character vector filename of privateEC results

update.freq

A integer for the number of steps before update

threshold

A numeric, default 4 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015)

tolerance

A numeric, default 1 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015)

signal.names

A character vector of signal names in simulated data

save.file

A character vector for results filename or NULL to skip

verbose

A flag indicating whether verbose output be sent to stdout

Value

A list containing:

algo.acc

data frame of results, a row for each update

ggplot.data

melted results data frame for plotting

correct

number of variables detected correctly in each data set

elapsed

total elapsed time

See Also

Other classification: epistasisRank(), getImportanceScores(), originalThresholdout(), privateEC(), standardRF(), xgboostRF()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
num.samples <- 100
num.variables <- 100
pct.signals <- 0.1
label <- "class"
temp.pec.file <- tempfile(pattern = "pEc_temp", tmpdir = tempdir())
sim.data <- createSimulation(num.variables = num.variables,
                             num.samples = num.samples,
                             pct.signals = pct.signals,
                             label = label,
                             sim.type = "mainEffect",
                             pct.train = 1 / 3,
                             pct.holdout = 1 / 3,
                             pct.validation = 1 / 3,
                             verbose = FALSE)
pec.results <- privateEC(train.ds = sim.data$train,
                         holdout.ds = sim.data$holdout,
                         validation.ds = sim.data$validation,
                         label = sim.data$label,
                         is.simulated = TRUE,
                         signal.names = sim.data$signal.names,
                         save.file = temp.pec.file,
                         verbose = FALSE)
prf.results <- privateRF(train.ds = sim.data$train,
                         holdout.ds = sim.data$holdout,
                         validation.ds = sim.data$validation,
                         label = sim.data$label,
                         is.simulated = TRUE,
                         signal.names = sim.data$signal.names,
                         pec.file = temp.pec.file,
                         verbose = FALSE)

insilico/privateEC documentation built on May 22, 2020, 5:12 p.m.