Calculate relevance statistics for input coordinates

Share:

Description

Computes relevance statistics for each input coordinate by calculating their particle-averaged mean reduction in variance each time that coordinate is used as a splitting variable in (an internal node of) the tree(s)

Usage

1
2
relevance.dynaTree(object, rect = NULL, categ = NULL,
     approx = FALSE, verb = 0)

Arguments

object

a "dynaTree"-class object built by dynaTree

rect

an optional matrix with two columns and ncol(object$X) rows describing the bounding rectangle for the ALC integration; the default that is used when rect = NULL is the bounding rectangle obtained by applying range to each column of object$X (taking care to remove the first/intercept column of object$X if icept = "augmented"

categ

A vector of logicals of length ncol(object$X) indicating which, if any, dimensions of the input space should be treated as categorical; the default categ argument is NULL meaning that the categorical inputs are derived from object$X in a sensible way

approx

a scalar logical indicating if the count of the number of data points in the leaf should be used in place of its area; this can help with numerical accuracy in high dimensional input spaces

verb

a positive scalar integer indicating how many particles should be processed (iterations) before a progress statement should be printed to the console; a (default) value of verb = 0 is quiet

Details

Each binary split in the tree (in each particle) emits a reduction in variance (for regression models) or a reduction in entropy (for classification). This function calculates these reductions and attributes them to the variable(s) involved in the split(s). Those with the largest relevances are the most useful for prediction. A sensible variable selection rule based on these relevances is to discard those variables whose median relevance is not positive. See the Gramacy, Taddy, \& Wild (2011) reference below for more details.

The new set of particles is appended to the old set. However after a subsequent update.dynaTree call the total number of particles reverts to the original amount.

Note that this does not work well with dynaTree objects which were built with model="linear". Rather, a full sensitivity analysis (sens.dynaTree) is needed. Usually it is best to first do model="constant" and then use relevance.dynaTree. Bayes factors (getBF) can be used to back up any variable selections implied by the relevance. Then, if desired, one can re-fit on the new (possibly reduced) set of predictors with model="linear".

There are no caveats with model="class"

Value

The entire object is returned with a new entry called relevance containing a matrix with ncol(X) columns. Each row contains the sample from the relevance of each input, and there is a row for each particle

Author(s)

Robert B. Gramacy rbgramacy@chicagobooth.edu,
Matt Taddy taddy@chicagobooth.edu, and
Christoforos Anagnostopoulos christoforos.anagnostopoulos06@imperial.ac.uk

References

Gramacy, R.B., Taddy, M.A., and S. Wild (2011). “Variable Selection and Sensitivity Analysis via Dynamic Trees with an Application to Computer Code Performance Tuning” arXiv:1108.4739

http://bobby.gramacy.com/r_packages/dynaTree/

See Also

dynaTree, sens.dynaTree, predict.dynaTree varpropuse, varproptotal

Examples

1
2
3
## see the examples in sens.dynaTree for the relevances;
## Also see varpropuse and the class2d demo via
## demo("class2d")