privateEC: Private Evaporative Cooling feature selection and...

Description Usage Arguments Value Note References See Also Examples

View source: R/classification.R

Description

Private Evaporative Cooling feature selection and classification

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
privateEC(train.ds = NULL, holdout.ds = NULL, validation.ds = NULL,
  label = "class", is.simulated = TRUE, bias = 0.4, update.freq = 5,
  importance.name = "relieff", importance.algorithm = "ReliefFbestK",
  learner.name = "randomforest", xgb.obj = "binary:logistic",
  use.nestedCV = TRUE, ncv_folds = c(10, 10), learner.cv = NULL,
  rf.mtry = NULL, rf.ntree = 500, xgb.num.rounds = c(1),
  xgb.max.depth = c(4), xgb.shrinkage = c(1), start.temp = 0.1,
  final.temp = 1e-05, tau.param = 100, threshold = 4/sqrt(nrow(train.ds)),
  tolerance = 1/sqrt(nrow(train.ds)), signal.names = NULL,
  save.file = NULL, verbose = FALSE)

Arguments

train.ds

A data frame with training data and outcome labels

holdout.ds

A data frame with holdout data and outcome labels

validation.ds

A data frame with validation data and outcome labels

label

A character vector of the outcome variable column name

bias

A numeric for effect size in simulated signal variables

update.freq

An integer the number of steps before update

importance.name

A character vector containg the importance algorithm name

importance.algorithm

A character vestor containing a specific importance algorithm subtype

learner.name

A character vector containg the learner algorithm name

use.nestedCV

A logic character indicating whether use nested cross validation or not

ncv_folds

A vector of integers fo the number of nested cross validation folds

learner.cv

An integer for the number of cross validation folds

rf.mtry

An integer for the number of variables used for node splits

xgb.num.rounds

= A vector of integers for xgboost algorithm iterations

xgb.max.depth

A vector of integers for the xboost maximum tree depth

xgb.shrinkage

= A vector of numerics for xgboost shrinkage values 0-1

start.temp

A numeric EC starting temperature

final.temp

A numeric EC final temperature

tau.param

A numeric tau to control temperature reduction schedule

threshold

A numeric, default 4 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015)

tolerance

A numeric, default 1 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015)

signal.names

A character vector of signal names in simulated data

save.file

A character vector for results filename or NULL to skip

verbose

A flag indicating whether verbose output be sent to stdout

Value

A list with:

algo.acc

data frame of results, a row for each update

ggplot.data

melted results data frame for plotting with ggplot

correct

number of variables detected correctly in each data set

atts.remain

name of the attributes in each iteraction

ncv.atts

name of the selected attributes using nested cross validation

elapsed

total elapsed time

Note

Within thresholdout, we choose a threshold of 4 / sqrt(n) and tolerance of 1 / sqrt(n) as suggested in the thresholdout’s supplementary material (Dwork, et al., 2015).

References

Trang Le, W. K. Simmons, M. Misaki, B.C. White, J. Savitz, J. Bodurka, and B. A. McKinney. “Differential privacy-based Evaporative Cooling feature selection and classification with Relief-F and Random Forests,” Bioinformatics. Accepted. https://doi.org/10.1093/bioinformatics/btx298. 2017

For more information see: Insilico Lab privateEC Page

See Also

Other classification: epistasisRank, getImportanceScores, originalThresholdout, privateRF, standardRF, xgboostRF

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
num.samples <- 100
num.variables <- 100
pct.signals <- 0.1
sim.data <- createSimulation(num.samples = num.samples,
                             num.variables = num.variables,
                             pct.signals = pct.signals,
                             pct.train = 1 / 3,
                             pct.holdout = 1 / 3,
                             pct.validation = 1 /3,
                             sim.type = "mainEffect",
                             verbose = FALSE)
pec.results <- privateEC(train.ds = sim.data$train,
                         holdout.ds = sim.data$holdout,
                         validation.ds = sim.data$validation,
                         label = sim.data$label,
                         importance.name = "relieff",
                         learner.name = "randomforest",
                         is.simulated = TRUE,
                         signal.names = sim.data$signal.names,
                         verbose = FALSE)
pec.results <- privateEC(train.ds = sim.data$train,
                         holdout.ds = sim.data$holdout,
                         validation.ds = sim.data$validation,
                         label = sim.data$label,
                         learner.name = "xgboost",
                         xgb.max.depth = 5,
                         is.simulated = TRUE,
                         signal.names = sim.data$signal.names,
                         verbose = FALSE)

hexhead/privateEC documentation built on July 20, 2018, 12:30 p.m.