gradientForest: Create gradientforest objects

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function uses the modified extendedForest package to conduct analysis on benthic data in the CoML synthesis project.

Usage

1
2
gradientForest(data,predictor.vars,response.vars,ntree=10,mtry=NULL,transform=NULL,
maxLevel=0,corr.threshold=0.5,compact=FALSE,nbin=101,trace=FALSE)

Arguments

data

data.frame containing where rows identify sites and columns contain response variables (usually species catch (numbers or weight) or predictor variables such as physical or chemical, variables. Column names identify species or specific predictor variable. If the species are numeric variables, a regression forest is calculated. If the species are factor variables, a classification forest is calculated.

predictor.vars

vector identifying which columns containing predictor variables (e.g., physical variables) are to be used in the randomForest analysis. This vector can contain column names (as a character) or column number.

response.vars

vector identifying which species are to be used in the randomForest analysis. This vector can contain column names (as a character) or column number.

ntree

number of bootstrapped trees to be generated by randomForest. Default set to 10 trees.

mtry

number of predictor variables randomly sampled as candidates at each split. Setting to NULL accepts default values. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3).

transform

a function defining a transformation to be applied the species data. For example, a square-root transformation would be entered as transform=function(x){sqrt(x)}. Default set to no transformation.

maxLevel

if maxLevel == 0, compute importance from marginal permutation distribution of each variable (the default). If maxLevel > 0, compute importance from conditional permutation distribution of each variable, permuted within 2^{maxLevel} partitions of correlated variables.

corr.threshold

if maxLevel > 0, OOB permuting is conditioned on partitions of variables having absolute correlation > corr.threshold.

compact

logical variable to choose standard method or compact method for aggregating importance measures across species. Compact=TRUE to be chosen when memory problems cause a crash in this function. Still experimental.

nbin

number of bins for compact option. Default set to 101.

trace

if TRUE show the progress. Default FALSE.

Details

gradientForest uses an extended version of the package randomForest (Liaw and Wiener 2002), extendedForest which retains all of the split values and fit improvements for each of the response variables (species catches in our case) for further analysis. gradientForest collates the numerous split values along each gradient and their associated fit improvements for each species that were retained by extendedForest, for each predictor in each tree and each forest. Details on the method are given in Ellis et al. (2012) and applications are described in Pitcher et al. (2012).

Value

X

data.frame of predictor variables.

Y

data.frame of response variables.

result

A named vector of species R^2 for those species for which the physical variables have some predictive power. For regression, these are for forests having positive R^2. For classification, these are for forests having error rate less than the base error rate 2*p*(1-p) with p the proportion of presences.

overall.imp

The mean raw accuracy importance, one value per physical variable.

overall.imp2

The mean raw impurity importance, one value per physical variable.

ntree

number of bootstrap trees generated by randomForest.

species.pos.rsq

The number of species for which the physical variables have some predictive power.

res

Amalgamated data frame containing splits for all species and physical variablee, as well as certain derived quantities.

res.u

Unique rows of res with columns restricted to "spec", "var", "rsq", "improve.tot" and "improve.tot.var"

dens

List of Gaussian kernel density estimates for each physical variable.

call

the matched call

Author(s)

Original code written by N. Ellis, CSIRO, Cleveland, Australia. <[email protected]>. Modified by S.J. Smith, DFO, Dartmouth, NS, Canada. <[email protected]>

References

Breiman, L. (2001) Random Forests. Machine Learning, 45(1), 5–32.

Ellis, N., Smith, S.J., and Pitcher, C.R. (2012) Gradient Forests: calculating importance gradients on physical predictors. Ecology, 93, 156–168.

Liaw, A. and Wiener, M. (2002) Classification and regression by randomforest. R News, 2(3), 18–22. http://CRAN.R-project.org/doc/Rnews/

Pitcher, C.R., P. Lawton, N. Ellis, S.J. Smith, L.S. Incze, C-L. Wei, M.E. Greenlaw, N.H. Wolff, J.A. Sameoto, P.V.R. Snelgrove. (2012) Exploring the role of environmental variables in shaping patterns of biodiversity composition in seabed assemblages. Journal of Applied Ecology, 49, 670–679.

Strobl, C. Boulesteix, A.-L., Kneib, T., Augustin, T. and Zeilis, A. (2008) Conditional variable importance for random forests. BMC Bioinformatics, 9, 307–317. Open Access: http://www.biomedcentral.com/1471-2105/9/307

See Also

print.gradientForest, plot.gradientForest

Examples

1
2
3
4
5

slarge/gradientForest documentation built on May 3, 2019, 4:05 p.m.