Description Usage Arguments Value Note References See Also Examples
View source: R/classification.R
Private Evaporative Cooling feature selection and classification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | privateEC(
train.ds = NULL,
holdout.ds = NULL,
validation.ds = NULL,
label = "class",
method.model = "classification",
is.simulated = TRUE,
bias = 0.4,
update.freq = 5,
importance.name = "relieff",
importance.algorithm = "ReliefFequalK",
relief.k.method = "k_half_sigma",
learner.name = "randomforest",
xgb.obj = "binary:logistic",
use.nestedCV = FALSE,
ncv_folds = c(10, 10),
learner.cv = NULL,
rf.mtry = NULL,
rf.ntree = 500,
xgb.num.rounds = c(1),
xgb.max.depth = c(4),
xgb.shrinkage = c(1),
start.temp = 0.1,
final.temp = 1e-05,
tau.param = 100,
threshold = 4/sqrt(nrow(train.ds)),
tolerance = 1/sqrt(nrow(train.ds)),
signal.names = NULL,
save.file = NULL,
verbose = FALSE
)
|
train.ds |
A data frame with training data and outcome labels |
holdout.ds |
A data frame with holdout data and outcome labels |
validation.ds |
A data frame with validation data and outcome labels |
label |
A character vector of the outcome variable column name. |
method.model |
Column name of outcome variable (string), classification or regression. If the analysis goal is classification make the column a factor type. For regression, make outcome column numeric type. |
is.simulated |
Is the data simulated (or real?) |
bias |
A numeric for effect size in simulated signal variables |
update.freq |
An integer the number of steps before update |
importance.name |
A character vector containg the importance algorithm name |
importance.algorithm |
A character vestor containing a specific importance algorithm subtype |
relief.k.method |
A character of numeric to indicate number of nearest neighbors for relief algorithm. Possible characters are: k_half_sigma (floor((num.samp-1)*0.154)), m6 (floor(num.samp/6)), myopic (floor((num.samp-1)/2)), and m4 (floor(num.samp/4)) |
learner.name |
A character vector containg the learner algorithm name |
xgb.obj |
A character vector containing the XGBoost ojective function name |
use.nestedCV |
A logic character indicating whether use nested cross validation or not |
ncv_folds |
A vector of integers fo the number of nested cross validation folds |
learner.cv |
An integer for the number of cross validation folds |
rf.mtry |
An integer for the number of variables used for node splits |
rf.ntree |
An integer the number of trees in the random forest |
xgb.num.rounds |
= A vector of integers for xgboost algorithm iterations |
xgb.max.depth |
A vector of integers for the xboost maximum tree depth |
xgb.shrinkage |
= A vector of numerics for xgboost shrinkage values 0-1 |
start.temp |
A numeric EC starting temperature |
final.temp |
A numeric EC final temperature |
tau.param |
A numeric tau to control temperature reduction schedule |
threshold |
A numeric, default 4 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015) |
tolerance |
A numeric, default 1 / sqrt(n) suggested in the thresholdout’s supplementary material (Dwork, et al.,2015) |
signal.names |
A character vector of signal names in simulated data |
save.file |
A character vector for results filename or NULL to skip |
verbose |
A flag indicating whether verbose output be sent to stdout |
A list with:
data frame of results, a row for each update
melted results data frame for plotting with ggplot
number of variables detected correctly in each data set
name of the attributes in each iteraction
name of the selected attributes using nested cross validation
total elapsed time
Within thresholdout, we choose a threshold of 4 / sqrt(n) and tolerance of 1 / sqrt(n) as suggested in the thresholdout’s supplementary material (Dwork, et al., 2015).
Trang Le, W. K. Simmons, M. Misaki, B.C. White, J. Savitz, J. Bodurka, and B. A. McKinney. “Differential privacy-based Evaporative Cooling feature selection and classification with Relief-F and Random Forests,” Bioinformatics. Accepted. https://doi.org/10.1093/bioinformatics/btx298. 2017
For more information see: Insilico Lab privateEC Page
Other classification:
epistasisRank()
,
getImportanceScores()
,
originalThresholdout()
,
privateRF()
,
standardRF()
,
xgboostRF()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | num.samples <- 100
num.variables <- 100
pct.signals <- 0.1
label <- "class"
sim.data <- createSimulation(num.samples = num.samples,
num.variables = num.variables,
pct.signals = pct.signals,
label = label,
pct.train = 1 / 3,
pct.holdout = 1 / 3,
pct.validation = 1 /3,
sim.type = "mainEffect",
verbose = FALSE)
pec.results <- privateEC(train.ds = sim.data$train,
holdout.ds = sim.data$holdout,
validation.ds = sim.data$validation,
label = sim.data$label,
is.simulated = TRUE,
importance.name = "relieff",
learner.name = "randomforest",
signal.names = sim.data$signal.names,
verbose = FALSE)
pec.results <- privateEC(train.ds = sim.data$train,
holdout.ds = sim.data$holdout,
validation.ds = sim.data$validation,
label = sim.data$label,
is.simulated = TRUE,
learner.name = "xgboost",
xgb.max.depth = 5,
signal.names = sim.data$signal.names,
verbose = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.