varimp_hrf: Variable importance
In htree: Historical Tree Ensembles for Longitudinal Data

Description Usage Arguments Details Value References See Also Examples

View source: R/hrf.R

Z-score variable importance for hrf and htb

1 2	varimp_hrf(object,nperm=20,parallel=TRUE) varimp_htb(object,nperm=20)

`object`	Return list from `hrf` or `htb`
`nperm`	Number of permutations.
`parallel`	If `TRUE`, run in parallel.

To measure the importance of a predictor, varimp_hrf and varimp_htb compare the prediction errors of the estimated model with the prediction errors obtained after integrating the predictor out of the model. If F denotes the estimated model, the model obtained by integrating out predictor k is F_k(x)=\int F(x) dP(x_k), where P(x_k) is the marginal distribution of x_k. In practice, the integration is done by averaging over multiple predictions from F, each obtained using a random permutation of the observed values of x_k. The number of permutations is set by nperm. Letting L(y,y_{hat})) be the loss of predicting y with y_{hat}, the vector w_i=L(y_i,F_k(x_i))-L(y_i,F(x_i)) for i=1,..,n gives the difference in the prediction error between the original and marginalized model. The corresponding z-score z=mean(w_i)/se(w_i) corresponds a paired test for the equality of the prediction errors, in which case it is approximately distributed as N(0,1). Larger z-score values indicate that the prediction error increases if x_k is marginalized out, and thus that x_k is useful. On the other hand, large negative values of the z-score indicate that the integrated model is more accurate. For longitudinal data, the w_i are computed by averaging across all observations from the i-th subject. For htb the prediction error is calculated based on the cross-validation model estimates, for hrf out-of-bag predictions are used.

A data.frame with columns: Predictor giving predictor being marginalized; Marginalized error gives the prediction error of model with Predictor marginalized out; Model error the prediction error with original model; Relative change gives relative change in prediction error due to marginalization; Z-value: Z value from test comparing prediction errors of original and marginalized models.

L. Breiman (2001). “Random Forests,” Machine Learning 45(1):5-32.

hrf, htb

## Not run: 

data(mscm) 
mscm=as.data.frame(na.omit(mscm))


# -- set concurrent and historical predictors 
historical_predictors=match(c("stress","illness"),names(mscm))
concurrent_predictors=which(names(mscm)!="stress")
control=list(vh=historical_predictors,vc=concurrent_predictors,nodesize=20)

## -- fit model
ff=hrf(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)

# -- variable importance table
vi=varimp_hrf(ff)
vi


## same with htb

control=list(vh=historical_predictors,vc=concurrent_predictors,
	lambda=.1,ntrees=200,nsplit=3,family="bernoulli")
control$cvfold=10 ## need cross-validation runs to run varimp_htb
ff=htb(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)

# -- variable importance table
vi=varimp_htb(ff)
vi




# --------------------------------------------------------------------------------------------- ##
# Boston Housing data 
#	Comparison of Z-score variable importance with coefficient Z-scores from linear model
# --------------------------------------------------------------------------------------------- ##

# Boston Housing data 
library(mlbench)
data(BostonHousing)
dat=as.data.frame(na.omit(BostonHousing))
dat$chas=as.numeric(dat$chas)

# -- random forest 
h=hrf(x=dat,yindx="medv")


# -- tree boosting
hb=htb(x=dat,yindx="medv",ntrees=1000,cv.fold=10,nsplit=3)


# -- Comparison of variable importance Z-scores and Z-scores from linear model 
vi=varimp_hrf(h)
vb=varimp_htb(hb)
dvi=data.frame(var=as.character(vi$Predictor),Z_hrf=vi$Z)
dvb=data.frame(var=as.character(vb$Predictor),Z_htb=vb$Z)

dlm=summary(lm(medv~.,dat))$coeffi
dlm=data.frame(var=rownames(dlm),Z_lm=round(abs(dlm[,3]),3))
dlm=merge(dlm[-1,],dvi,by="var",all.x=TRUE)

# -- Z-scores of hrf and lm for predictor variables 
merge(dlm,dvb,by="var",all.x=TRUE)




## End(Not run)