varimp_hrf: Variable importance

Description Usage Arguments Details Value References See Also Examples

View source: R/hrf.R

Description

Z-score variable importance for hrf and htb

Usage

1
2
varimp_hrf(object,nperm=20,parallel=TRUE)
varimp_htb(object,nperm=20)

Arguments

object

Return list from hrf or htb

nperm

Number of permutations.

parallel

If TRUE, run in parallel.

Details

To measure the importance of a predictor, varimp_hrf and varimp_htb compare the prediction errors of the estimated model with the prediction errors obtained after integrating the predictor out of the model. If F denotes the estimated model, the model obtained by integrating out predictor k is F_k(x)=\int F(x) dP(x_k), where P(x_k) is the marginal distribution of x_k. In practice, the integration is done by averaging over multiple predictions from F, each obtained using a random permutation of the observed values of x_k. The number of permutations is set by nperm. Letting L(y,y_{hat})) be the loss of predicting y with y_{hat}, the vector w_i=L(y_i,F_k(x_i))-L(y_i,F(x_i)) for i=1,..,n gives the difference in the prediction error between the original and marginalized model. The corresponding z-score z=mean(w_i)/se(w_i) corresponds a paired test for the equality of the prediction errors, in which case it is approximately distributed as N(0,1). Larger z-score values indicate that the prediction error increases if x_k is marginalized out, and thus that x_k is useful. On the other hand, large negative values of the z-score indicate that the integrated model is more accurate. For longitudinal data, the w_i are computed by averaging across all observations from the i-th subject. For htb the prediction error is calculated based on the cross-validation model estimates, for hrf out-of-bag predictions are used.

Value

A data.frame with columns: Predictor giving predictor being marginalized; Marginalized error gives the prediction error of model with Predictor marginalized out; Model error the prediction error with original model; Relative change gives relative change in prediction error due to marginalization; Z-value: Z value from test comparing prediction errors of original and marginalized models.

References

L. Breiman (2001). “Random Forests,” Machine Learning 45(1):5-32.

See Also

hrf, htb

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
## Not run: 

data(mscm) 
mscm=as.data.frame(na.omit(mscm))


# -- set concurrent and historical predictors 
historical_predictors=match(c("stress","illness"),names(mscm))
concurrent_predictors=which(names(mscm)!="stress")
control=list(vh=historical_predictors,vc=concurrent_predictors,nodesize=20)

## -- fit model
ff=hrf(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)

# -- variable importance table
vi=varimp_hrf(ff)
vi


## same with htb

control=list(vh=historical_predictors,vc=concurrent_predictors,
	lambda=.1,ntrees=200,nsplit=3,family="bernoulli")
control$cvfold=10 ## need cross-validation runs to run varimp_htb
ff=htb(x=mscm,id=mscm$id,time=mscm$day,yindx="illness",control=control)

# -- variable importance table
vi=varimp_htb(ff)
vi




# --------------------------------------------------------------------------------------------- ##
# Boston Housing data 
#	Comparison of Z-score variable importance with coefficient Z-scores from linear model
# --------------------------------------------------------------------------------------------- ##

# Boston Housing data 
library(mlbench)
data(BostonHousing)
dat=as.data.frame(na.omit(BostonHousing))
dat$chas=as.numeric(dat$chas)

# -- random forest 
h=hrf(x=dat,yindx="medv")


# -- tree boosting
hb=htb(x=dat,yindx="medv",ntrees=1000,cv.fold=10,nsplit=3)


# -- Comparison of variable importance Z-scores and Z-scores from linear model 
vi=varimp_hrf(h)
vb=varimp_htb(hb)
dvi=data.frame(var=as.character(vi$Predictor),Z_hrf=vi$Z)
dvb=data.frame(var=as.character(vb$Predictor),Z_htb=vb$Z)

dlm=summary(lm(medv~.,dat))$coeffi
dlm=data.frame(var=rownames(dlm),Z_lm=round(abs(dlm[,3]),3))
dlm=merge(dlm[-1,],dvi,by="var",all.x=TRUE)

# -- Z-scores of hrf and lm for predictor variables 
merge(dlm,dvb,by="var",all.x=TRUE)




## End(Not run)

htree documentation built on May 1, 2019, 9:11 p.m.

Related to varimp_hrf in htree...