Description Usage Arguments Details Value References See Also Examples
Z-score variable importance for hrf
and htb
1 2 | varimp_hrf(object,nperm=20,parallel=TRUE)
varimp_htb(object,nperm=20)
|
object |
Return list from |
nperm |
Number of permutations. |
parallel |
If |
To measure the importance of a predictor, varimp_hrf
and varimp_htb
compare the prediction errors of the estimated model with the
prediction errors obtained after integrating the predictor out of the model. If F denotes the estimated model, the model obtained by
integrating out predictor k
is F_k(x)=\int F(x) dP(x_k), where P(x_k) is the marginal distribution of x_k. In
practice, the integration is done by averaging over multiple predictions from F, each obtained using a random permutation of the observed values of x_k. The number of
permutations is set by nperm
. Letting L(y,y_{hat})) be the loss of predicting y with y_{hat},
the vector w_i=L(y_i,F_k(x_i))-L(y_i,F(x_i)) for i=1,..,n gives the difference in the prediction error between the original and marginalized model.
The corresponding z-score
z=mean(w_i)/se(w_i) corresponds a paired test for the equality of the prediction errors, in which case it is approximately distributed as N(0,1). Larger z-score values indicate that the prediction error increases if x_k is marginalized out, and thus that x_k is useful. On the other hand, large negative values of the z-score indicate that the integrated model is more accurate. For longitudinal data, the w_i
are computed by averaging across all observations from the i
-th subject. For htb
the prediction error is calculated based on the cross-validation model estimates, for hrf
out-of-bag predictions
are used.
A data.frame
with columns: Predictor
giving predictor being marginalized; Marginalized error
gives the prediction error of model with Predictor
marginalized out; Model error
the prediction error with original model; Relative change
gives relative change in prediction error due to marginalization; Z-value
: Z
value from test comparing prediction errors of original and marginalized models.
L. Breiman (2001). “Random Forests,” Machine Learning 45(1):5-32.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ## Not run:
data(mscm)
mscm=as.data.frame(na.omit(mscm))
# -- set concurrent and historical predictors
historical_predictors=match(c("stress","illness"),names(mscm))
concurrent_predictors=which(names(mscm)!="stress")
control=list(vh=historical_predictors,vc=concurrent_predictors,nodesize=20)
## -- fit model
ff=hrf(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)
# -- variable importance table
vi=varimp_hrf(ff)
vi
## same with htb
control=list(vh=historical_predictors,vc=concurrent_predictors,
lambda=.1,ntrees=200,nsplit=3,family="bernoulli")
control$cvfold=10 ## need cross-validation runs to run varimp_htb
ff=htb(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)
# -- variable importance table
vi=varimp_htb(ff)
vi
# --------------------------------------------------------------------------------------------- ##
# Boston Housing data
# Comparison of Z-score variable importance with coefficient Z-scores from linear model
# --------------------------------------------------------------------------------------------- ##
# Boston Housing data
library(mlbench)
data(BostonHousing)
dat=as.data.frame(na.omit(BostonHousing))
dat$chas=as.numeric(dat$chas)
# -- random forest
h=hrf(x=dat,yindx="medv")
# -- tree boosting
hb=htb(x=dat,yindx="medv",ntrees=1000,cv.fold=10,nsplit=3)
# -- Comparison of variable importance Z-scores and Z-scores from linear model
vi=varimp_hrf(h)
vb=varimp_htb(hb)
dvi=data.frame(var=as.character(vi$Predictor),Z_hrf=vi$Z)
dvb=data.frame(var=as.character(vb$Predictor),Z_htb=vb$Z)
dlm=summary(lm(medv~.,dat))$coeffi
dlm=data.frame(var=rownames(dlm),Z_lm=round(abs(dlm[,3]),3))
dlm=merge(dlm[-1,],dvi,by="var",all.x=TRUE)
# -- Z-scores of hrf and lm for predictor variables
merge(dlm,dvb,by="var",all.x=TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.