View source: R/CSMES.predict.R
CSMES.predict | R Documentation |
This function generates predictions for a new data set (containing candidate member library predictions) using a CSMES model. Using Pareto-optimal ensemble definitions
generated through CSMES.ensSel
and the ensemble nomination front generated using CSMES.EnsNomCurve
, final ensemble predictions are generated in function of
cost information known to the user at the time of model scoring. The model allows for three scenarios: (1) the candidate ensemble is nominated in function of a specific cost
ratio, (2) the ensemble is nominated in function of partial AUCC (or a distribution over operating points) and (3) the candidate ensemble that is
optimal over the entire cost space in function of area under the cost or brier curve is chosen.
CSMES.predict( ensSelModel, ensNomCurve, newdata, criterion = c("minEMC", "minAUCC", "minPartAUCC"), costRatio = 5, partAUCC_mu = 0.5, partAUCC_sd = 0.1 )
ensSelModel |
ensemble selection model (output of |
ensNomCurve |
ensemble nomination curve object (output of |
newdata |
matrix containing ensemble library member model predictions for new data set |
criterion |
This argument specifies which criterion determines the selection of the ensemble candidate that delivers predictions. Can be one of three options: "minEMC", "minAUCC" or "minPartAUCC". |
costRatio |
Specifies the cost ratio used to determine expected misclassification cost. Only relvant when |
partAUCC_mu |
Desired mean operating condition when |
partAUCC_sd |
Desired standard deviation when |
An list with the following components:
pred |
A matrix with model predictions. Both class and probability predictions are delivered. |
criterion |
The criterion specified to determine the selection of the ensemble candidate. |
costRatio |
The cost ratio in function of which the |
Koen W. De Bock, kdebock@audencia.com
De Bock, K.W., Lessmann, S. And Coussement, K., Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach, European Journal of Operational Research (2020), doi: 10.1016/j.ejor.2020.01.052.
CSMES.ensSel
, CSMES.predictPareto
, CSMES.ensNomCurve
##load data library(rpart) library(zoo) library(ROCR) library(mco) data(BFP) ##generate random order vector BFP_r<-BFP[sample(nrow(BFP),nrow(BFP)),] size<-nrow(BFP_r) ##size<-300 train<-BFP_r[1:floor(size/3),] val<-BFP_r[ceiling(size/3):floor(2*size/3),] test<-BFP_r[ceiling(2*size/3):size,] ##generate a list containing model specifications for 100 CART decisions trees varying in the cp ##and minsplit parameters, and trained on bootstrap samples (bagging) rpartSpecs<-list() for (i in 1:100){ data<-train[sample(1:ncol(train),size=ncol(train),replace=TRUE),] str<-paste("rpartSpecs$rpart",i,"=rpart(as.formula(Class~.),data,method=\"class\", control=rpart.control(minsplit=",round(runif(1, min = 1, max = 20)),",cp=",runif(1, min = 0.05, max = 0.4),"))",sep="") eval(parse(text=str)) } ##generate predictions for these models hillclimb<-mat.or.vec(nrow(val),100) for (i in 1:100){ str<-paste("hillclimb[,",i,"]=predict(rpartSpecs[[i]],newdata=val)[,2]",sep="") eval(parse(text=str)) } ##score the validation set used for ensemble selection, to be used for ensemble selection ESmodel<-CSMES.ensSel(hillclimb,val$Class,obj1="FNR",obj2="FPR",selType="selection", generations=10,popsize=12,plot=TRUE) ## Create Ensemble nomination curve enc<-CSMES.ensNomCurve(ESmodel,hillclimb,val$Class,curveType="costCurve",method="classPreds", plot=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.