gradientForest: Create gradientforest objects
In slarge/gradientForest: Random Forest functions for the Census of Marine Life synthesis project.

Description Usage Arguments Details Value Author(s) References See Also Examples

This function uses the modified extendedForest package to conduct analysis on benthic data in the CoML synthesis project.

1 2	gradientForest(data,predictor.vars,response.vars,ntree=10,mtry=NULL,transform=NULL, maxLevel=0,corr.threshold=0.5,compact=FALSE,nbin=101,trace=FALSE)

`data`	data.frame containing where rows identify sites and columns contain response variables (usually species catch (numbers or weight) or predictor variables such as physical or chemical, variables. Column names identify species or specific predictor variable. If the species are numeric variables, a regression forest is calculated. If the species are factor variables, a classification forest is calculated.
`predictor.vars`	vector identifying which columns containing predictor variables (e.g., physical variables) are to be used in the randomForest analysis. This vector can contain column names (as a character) or column number.
`response.vars`	vector identifying which species are to be used in the randomForest analysis. This vector can contain column names (as a character) or column number.
`ntree`	number of bootstrapped trees to be generated by randomForest. Default set to 10 trees.
`mtry`	number of predictor variables randomly sampled as candidates at each split. Setting to NULL accepts default values. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3).
`transform`	a function defining a transformation to be applied the species data. For example, a square-root transformation would be entered as `transform=function(x){sqrt(x)}`. Default set to no transformation.
`maxLevel`	if maxLevel == 0, compute importance from marginal permutation distribution of each variable (the default). If maxLevel > 0, compute importance from conditional permutation distribution of each variable, permuted within 2^{maxLevel} partitions of correlated variables.
`corr.threshold`	if maxLevel > 0, OOB permuting is conditioned on partitions of variables having absolute correlation > corr.threshold.
`compact`	logical variable to choose standard method or compact method for aggregating importance measures across species. Compact=TRUE to be chosen when memory problems cause a crash in this function. Still experimental.
`nbin`	number of bins for compact option. Default set to 101.
`trace`	if `TRUE` show the progress. Default `FALSE`.

gradientForest uses an extended version of the package randomForest (Liaw and Wiener 2002), extendedForest which retains all of the split values and fit improvements for each of the response variables (species catches in our case) for further analysis. gradientForest collates the numerous split values along each gradient and their associated fit improvements for each species that were retained by extendedForest, for each predictor in each tree and each forest. Details on the method are given in Ellis et al. (2012) and applications are described in Pitcher et al. (2012).

`X`	data.frame of predictor variables.
`Y`	data.frame of response variables.
`result`	A named vector of species R^2 for those species for which the physical variables have some predictive power. For regression, these are for forests having positive R^2. For classification, these are for forests having error rate less than the base error rate 2p(1-p) with p the proportion of presences.
`overall.imp`	The mean raw accuracy importance, one value per physical variable.
`overall.imp2`	The mean raw impurity importance, one value per physical variable.
`ntree`	number of bootstrap trees generated by randomForest.
`species.pos.rsq`	The number of species for which the physical variables have some predictive power.
`res`	Amalgamated data frame containing splits for all species and physical variablee, as well as certain derived quantities.
`res.u`	Unique rows of res with columns restricted to "spec", "var", "rsq", "improve.tot" and "improve.tot.var"
`dens`	List of Gaussian kernel density estimates for each physical variable.
`call`	the matched call

Original code written by N. Ellis, CSIRO, Cleveland, Australia. <Nick.Ellis@csiro.au>. Modified by S.J. Smith, DFO, Dartmouth, NS, Canada. <Stephen.Smith@dfo-mpo.gc.ca>

Breiman, L. (2001) Random Forests. Machine Learning, 45(1), 5–32.

Ellis, N., Smith, S.J., and Pitcher, C.R. (2012) Gradient Forests: calculating importance gradients on physical predictors. Ecology, 93, 156–168.

Liaw, A. and Wiener, M. (2002) Classification and regression by randomforest. R News, 2(3), 18–22. http://CRAN.R-project.org/doc/Rnews/

Pitcher, C.R., P. Lawton, N. Ellis, S.J. Smith, L.S. Incze, C-L. Wei, M.E. Greenlaw, N.H. Wolff, J.A. Sameoto, P.V.R. Snelgrove. (2012) Exploring the role of environmental variables in shaping patterns of biodiversity composition in seabed assemblages. Journal of Applied Ecology, 49, 670–679.

Strobl, C. Boulesteix, A.-L., Kneib, T., Augustin, T. and Zeilis, A. (2008) Conditional variable importance for random forests. BMC Bioinformatics, 9, 307–317. Open Access: http://www.biomedcentral.com/1471-2105/9/307

print.gradientForest, plot.gradientForest

data(CoMLsimulation)
preds <- colnames(Xsimulation)
specs <- colnames(Ysimulation)
f1 <- gradientForest(data.frame(Ysimulation,Xsimulation), preds, specs, ntree=10)
f1