Description Details Author(s) References See Also Examples
R reference class to calculate p.values for individual predictions according to the conformal prediction framework.
The reference class ConformalClassification contains the following fields:
ClassificationModel: stores a classification Random Forest model.
confidence: stores the user-defined confidence level.
data.new: stores the descriptors corresponding to an external set.
NonconformityScoresMatrix: stores the non conformity scores matrix.
ClassPredictions: stores the class predictions calculated for the external set.
p.values: a list storing
P.values: a data.frame containing the p.values calculated for the external set. Rows are indexed by datapoints, whereas columns are indexed by classes. The names of the rows correspond to the names of the rows in the external set.
Significance_p.values: a data.frame reporting the significance of the p.values (where 1 means significant, and 0 not significant), according to the user-defined confidence level, ε (the default value is 0.8). Rows are indexed by datapoints, whereas columns are indexed by classes. The names of the rows correspond to the names of the rows in the external set.
The class ConformalClassification contains the following methods:
initialize: this method is called when you create an instance of the class. The default value for the confidence level is 0.8.
CalculateCVScores: this method calculates the non conformitity scores (or probabilities) matrix from the cross-validation predictions of the input randomForest model (trained with k-fold cross-validation). The non conformity scores matrix is stored in the field NonconformityScoresMatrix.
CalculatePValues: this method calculates the p.values for the datapoints in a external set. The class predictions are stored in the field ClassPredictions, whereas the p.values and their significance, according to the user defined confidence level, are stored in the field p.values.
Isidro Cortes-Ciriano <isidrolauscher@gmail.com>
Norinder et al. J. Chem. Inf. Model., 2014, 54 (6), pp 1596-1603 DOI: 10.1021/ci5001168 http://pubs.acs.org/doi/abs/10.1021/ci5001168
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | showClass("ConformalClassification")
# Optional for parallel training
#library(doMC)
#registerDoMC(cores=4)
data(LogS)
# convert data to categorical
LogSTrain[LogSTrain > -4] <- 1
LogSTrain[LogSTrain <= -4] <- 2
LogSTest[LogSTest > -4] <- 1
LogSTest[LogSTest <= -4] <- 2
LogSTrain <- factor(LogSTrain)
LogSTest <- factor(LogSTest)
# Remove part of the data to allow for quick training
LogSTrain <- LogSTrain[1:20]
LogSTest <- LogSTest[1:20]
LogSDescsTrain <- LogSDescsTrain[1:20,]
LogSDescsTest <- LogSDescsTest[1:20,]
algorithm <- "rf"
trControl <- trainControl(method = "cv", number=5,savePredictions=TRUE)
set.seed(3)
#number of trees
nb_trees <- 100
model <- train(LogSDescsTrain, LogSTrain,
algorithm,type="classification",
trControl=trControl,predict.all=TRUE,
keep.forest=TRUE,norm.votes=TRUE,
ntree=nb_trees)
# Instantiate the class and get the p.values
example <- ConformalClassification$new()
example$CalculateCVScores(model=model)
example$CalculatePValues(new.data=LogSDescsTest)
# we get the p.values:
example$p.values$P.values
# we get the significance of these p.values.
example$p.values$Significance_p.values
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.