rf.effectSize: Random Forest effect size

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/rf.effectSize.R

Description

Parameter effect size based on partial dependency (Cafri & Bailey, 2016)

Usage

1

Arguments

x

A randomForest model object

y

A vector represent the independent variable of intrests

...

Arguments passed to the partial dependency function, requires x.var, pred.data,

Value

A vector (single value) of the parameter effect size

Note

Effect size based on partial dependency and parameter-weighted OLS (does not support factoral or dichotomous variables) The algorithm follows: 1) Grow a forest 2) Estimate partial dependence (for a single variable). a. Create datasets for all observation in the dataset only let them take on one value for the variable of interest while keeping values of all other variables unchanged. b. Pass the dataset through each tree and average the predictions over the trees in the forest. 3) Construct a point estimate of the proposed effect size by fitting a weighted least squares model with response based on the tree-averaged predicted values obtained in Step 2, the explanatory variable corresponding to the value used to generate each tree-averaged prediction, and weight based on the frequency each value the explanatory variable takes on in the original data. 4) For confidence intervals, repeat Steps 1-3 for as many bootstrap samples as desired Modified partialPlot function uses distinct X values to construct partial dependence for non-factor variables

Author(s)

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

References

Cafri, G., B.A. Bailey (2016) Understanding Variable Effects from Black Box Prediction: Quantifying Effects in Tree Ensembles Using Partial Dependence. Journal of Data Science 14:67-96

See Also

partialPlot for ... options

randomForest for randomForest details

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
 library(randomForest)
   data(airquality)
   airquality <- na.omit(airquality)
     fit.reg <- randomForest(Ozone ~ ., data=airquality)
 
 # Parameter effect sizes	
 rf.effectSize(fit.reg, y = airquality$Solar.R, pred.data = airquality, x.var = Solar.R)
 rf.effectSize(fit.reg, y = airquality$Wind, pred.data = airquality, x.var = Wind)
 rf.effectSize(fit.reg, y = airquality$Temp, pred.data = airquality, x.var = Temp)
 rf.effectSize(fit.reg, y = airquality$Month, pred.data = airquality, x.var = Month)
 rf.effectSize(fit.reg, y = airquality$Day, pred.data = airquality, x.var = Day)
 	
## Not run: 
 # Bootstrap of effect size for Wind and Temp
 B = 999
 n = nrow(airquality)
 es.boot.wind <- vector()
 es.boot.temp <- vector()
   for(i in 1:B) {
     boot.samples <- airquality[sample(1:nrow(airquality), n, replace = TRUE),]
       fmla <- stats::as.formula(paste(paste("Ozone", "~", sep=""), paste(".", collapse= "")))   
         fit <- randomForest(fmla, data = boot.samples)
     es.boot.wind <- append(es.boot.wind, rf.effectSize(fit, y = boot.samples$Wind, 
 	                       pred.data = boot.samples, x.var = Wind))
     es.boot.temp <- append(es.boot.temp, rf.effectSize(fit, y = boot.samples$Temp, 
 	                       pred.data = boot.samples,x.var = Temp))
   }        
  se <- function(x) sqrt(var(x, na.rm = TRUE) / length(na.omit(x)))
    cat("Bootstrap variance for Wind:", var(es.boot.wind), "\n")
      cat("Bootstrap standard error for Wind:", se(es.boot.wind), "\n","\n")
    cat("Bootstrap variance for Temp:", var(es.boot.temp), "\n")
  cat("Bootstrap standard error for Temp:", se(es.boot.temp), "\n")

 # Confidence intervals of Bootstrap of effect size for Wind
 p=0.95
 y <- sort(es.boot.wind) 
 x <- 1:length(y)
   plx <- stats::predict(stats::loess(y ~ x), se=TRUE)
   lci = plx$fit - stats::qt(p, plx$df) * plx$se.fit
   uci = plx$fit + stats::qt(p, plx$df) * plx$se.fit
       graphics::plot(x, y, type="n", main="Effect size Bootstrap CI for Wind", 
 	    sub=paste("confidence intervals at", p))
       graphics::polygon(c(x,rev(x)), c(lci, rev(uci)), col="grey86")
       graphics::points(x, y, pch=20, cex=0.70)
       graphics::lines(x, plx[["fit"]], lty=3)
 
 # Confidence intervals of Bootstrap of effect size for Temp
 p=0.95
 y <- sort(es.boot.temp) 
 x <- 1:length(y)
   plx <- stats::predict(stats::loess(y ~ x), se=TRUE)
   lci = plx$fit - stats::qt(p, plx$df) * plx$se.fit
   uci = plx$fit + stats::qt(p, plx$df) * plx$se.fit
       graphics::plot(x, y, type="n", main="Effect size Bootstrap CI for Temp", 
 	    sub=paste("confidence intervals at", p))
       graphics::polygon(c(x,rev(x)), c(lci, rev(uci)), col="grey86")
       graphics::points(x, y, pch=20, cex=0.70)
       graphics::lines(x, plx[["fit"]], lty=3)
 	  
 # Plot bootstrap of wind effect size
 pdf <- density(es.boot.wind)
 plot(pdf, type="n", main="Bootstrap of effect size wind (n=99)", 
      ylab="p", xlab="effect size")
   polygon(pdf, col="grey")
   abline(v=mean(es.boot.wind))
     abline(v=mean(es.boot.wind)-sd(es.boot.wind), col="blue", lty=3)
     abline(v=mean(es.boot.wind)+sd(es.boot.wind), col="blue", lty=3)	  
 
 # Plot bootstrap of temp effect size	
 pdf <- density(es.boot.temp)
 plot(pdf, type="n", main="Bootstrap of effect size temp (n=99)", 
      ylab="p", xlab="effect size")
   polygon(pdf, col="grey")	
   abline(v=mean(es.boot.temp))
     abline(v=mean(es.boot.temp)-sd(es.boot.temp), col="blue", lty=3)
     abline(v=mean(es.boot.temp)+sd(es.boot.temp), col="blue", lty=3)

## End(Not run)

rfUtilities documentation built on Oct. 3, 2019, 9:04 a.m.