rf.modelSel: Random Forest Model Selection

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rf.modelSel.R

Description

Implements Murphy et al., (2010) Random Forests model selection approach.

Usage

1
2
rf.modelSel(xdata, ydata, imp.scale = "mir", r = c(0.25, 0.5, 0.75),
  final.model = FALSE, seed = NULL, parsimony = NULL, ...)

Arguments

xdata

X Data for model

ydata

Y Data for model

imp.scale

Type of scaling for importance values (mir or se), default is mir

r

Vector of importance percentiles to test i.e., c(0.1, 0.2, 0.5, 0.7, 0.9)

final.model

Run final model with selected variables (TRUE/FALSE)

seed

Sets random seed in the R global environment. This is highly suggested.

parsimony

Threshold for competing model (0-1)

...

Additional arguments to pass to randomForest (e.g., ntree=1000, replace=TRUE, proximity=TRUE)

Details

If you want to run classification, make sure that y is a factor, otherwise the randomForest model runs in regression mode For classification problems the model selection criteria is: smallest OOB error, smallest maximum within class error, and fewest parameters. For regression problems, the model selection criteria is; largest

The "mir" scale option performs a row standardization and the "se" option performs normalization using the "standard errors" of the permutation-based importance measure. Both options result in a 0-1 range but, "se" sums to 1. The scaled importance measures are calculated as: mir = i/max(i) and se = (i / se) / ( sum(i) / se). The parsimony argument is the percent of allowable error surrounding competing models. For example, if there are two competing models, a selected model with 5 parameters and a competing model with 3 parameters, and parsimony = 0.05, if there is +/- 5 the fewer parameter model it will be selected at the final model.

Value

A list class object with the following components:

Author(s)

Jeffrey S. Evans <jeffrey_evans@tnc.org>

References

Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.

Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261

Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer

See Also

randomForest for randomForest ... model options

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Classification on iris data
require(randomForest)
data(iris)
  iris$Species <- as.factor(iris$Species)
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir") )
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir", 
                          parsimony=0.03) )

   plot(rf.class)              # plot importance for selected variables
   plot(rf.class, imp = "all") # plot importance for all variables 

 vars <- rf.class$selvars
 ( rf.fit <- randomForest(x=iris[,vars], y=iris[,"Species"]) )

# Regression on airquality data
data(airquality)
  airquality <- na.omit(airquality)
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se") )
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se", parsimony=0.03) )

   plot(rf.regress)              # plot importance for selected variables
   plot(rf.regress, imp = "all") # plot importance for all variables 

# To use parameters from competing model
vars <- rf.regress$parameters[[3]]

# To use parameters from selected model
vars <- rf.regress$selvars 

( rf.fit <- randomForest(x=airquality[,vars], y=airquality[,1]) )

rfUtilities documentation built on Oct. 3, 2019, 9:04 a.m.