ipmparty: IPM casewise with CIT-RF by 'party' for OOB samples

View source: R/ipmparty.R

ipmpartyR Documentation

IPM casewise with CIT-RF by party for OOB samples

Description

The IPM for a case in the training set is calculated by considering and averaging over only the trees where the case belongs to the OOB set. The case is put down each of the trees where the case belongs to the OOB set. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case's way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this case is obtained by averaging those percentages over only the trees where the case belongs to the OOB set. The random forest is based on CIT (Conditional Inference Trees).

Usage

ipmparty(marbol, da, ntree)

Arguments

marbol

Random forest obtained with cforest. Responses can be of the same type supported by cforest, not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses.

da

Data frame with the predictors only, not responses, of the training set used for computing marbol. Each row corresponds to an observation and each column corresponds to a predictor. Predictors can be numeric, nominal or an ordered factor.

ntree

Number of trees in the random forest.

Details

All details are given in Epifanio (2017).

Value

It returns IPM for cases in the training set. It is estimated when they are OOB observations. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. IPM can be estimated for any kind of RF computed by cforest, including multivariate RF.

Note

See Epifanio (2017) about advantages and limitations of IPM, and about the parameters to be used in cforest.

Author(s)

Irene Epifanio

References

Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455–465.

Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230.

See Also

ipmpartynew, ipmrf, ipmranger, ipmrfnew, ipmrangernew, ipmgbmnew

Examples


#Note: more examples can be found at 
#https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8

## Not run: 
    ## -------------------------------------------------------
    ## Example from \code{\link[party]{varimp}} in \pkg{party}
    ## Classification RF
    ## -------------------------------------------------------

    library(party)
    
    #from help in varimp by party package
    set.seed(290875)
    readingSkills.cf <- cforest(score ~ ., data = readingSkills,
                                control = cforest_unbiased(mtry = 2, ntree = 50))
    
    # standard importance
    varimp(readingSkills.cf)
    
    # the same modulo random variation
    varimp(readingSkills.cf, pre1.0_0 = TRUE)
    
    # conditional importance, may take a while...
    varimp(readingSkills.cf, conditional = TRUE)

## End(Not run)

#IMP based on CIT-RF (party package)
library(party)

ntree<-50
#readingSkills: data from party package
da<-readingSkills[,1:3] 
set.seed(290875)
readingSkills.cf3 <- cforest(score ~ ., data = readingSkills,
                             control = cforest_unbiased(mtry = 3, ntree = 50))

#IPM case-wise computed with OOB with party
pupf<-ipmparty(readingSkills.cf3 ,da,ntree)

#global IPM
pua<-apply(pupf,2,mean) 
pua


## Not run: 
    ## -------------------------------------------------------
    ## Example from \code{\link[randomForestSRC]{var.select}} in \pkg{randomForestSRC} 
    ## Multivariate mixed forests
    ## -------------------------------------------------------
    
    if(require("randomForestSRC")) {
    
        #from help in var.select by randomForestSRC package
        mtcars.new <- mtcars
        mtcars.new$cyl <- factor(mtcars.new$cyl)
        mtcars.new$carb <- factor(mtcars.new$carb, ordered = TRUE)
        mv.obj <- rfsrc(cbind(carb, mpg, cyl) ~., data = mtcars.new,
                        importance = TRUE)
        var.select(mv.obj, method = "vh.vimp", nrep = 10) 
        
        #different variables are selected if var.select is repeated
    }

## End(Not run)

#IMP based on CIT-RF (party package)
if(require("randomForestSRC")) {
    mtcars.new <- mtcars
    
    ntree<-500
    da<-mtcars.new[,3:10] 
    mc.cf <- cforest(carb+ mpg+ cyl ~., data = mtcars.new,
                     control = cforest_unbiased(mtry = 8, ntree = 500))
    
    #IPM case-wise computing with OOB with party
    pupf<-ipmparty(mc.cf ,da,ntree) 
    
    #global IPM
    pua<-apply(pupf,2,mean) 
    pua
    
    #disp and hp are consistently selected as more important if repeated
}


aleixalcacer/IPMRF documentation built on April 23, 2022, 3:50 a.m.