relevance.dynaTree: Calculate relevance statistics for input coordinates In dynaTree: Dynamic Trees for Learning and Design

Description

Computes relevance statistics for each input coordinate by calculating their particle-averaged mean reduction in variance each time that coordinate is used as a splitting variable in (an internal node of) the tree(s)

Usage

 ```1 2``` ```relevance.dynaTree(object, rect = NULL, categ = NULL, approx = FALSE, verb = 0) ```

Arguments

 `object` a `"dynaTree"`-class object built by `dynaTree` `rect` an optional `matrix` with two columns and `ncol(object\$X)` rows describing the bounding rectangle for the ALC integration; the default that is used when `rect = NULL` is the bounding rectangle obtained by applying `range` to each column of `object\$X` (taking care to remove the first/intercept column of `object\$X` if ```icept = "augmented"``` `categ` A vector of logicals of length `ncol(object\$X)` indicating which, if any, dimensions of the input space should be treated as categorical; the default `categ` argument is `NULL` meaning that the categorical inputs are derived from `object\$X` in a sensible way `approx` a scalar logical indicating if the count of the number of data points in the leaf should be used in place of its area; this can help with numerical accuracy in high dimensional input spaces `verb` a positive scalar integer indicating how many particles should be processed (iterations) before a progress statement should be printed to the console; a (default) value of `verb = 0` is quiet

Details

Each binary split in the tree (in each particle) emits a reduction in variance (for regression models) or a reduction in entropy (for classification). This function calculates these reductions and attributes them to the variable(s) involved in the split(s). Those with the largest relevances are the most useful for prediction. A sensible variable selection rule based on these relevances is to discard those variables whose median relevance is not positive. See the Gramacy, Taddy, \& Wild (2011) reference below for more details.

The new set of particles is appended to the old set. However after a subsequent `update.dynaTree` call the total number of particles reverts to the original amount.

Note that this does not work well with `dynaTree` objects which were built with `model="linear"`. Rather, a full sensitivity analysis (`sens.dynaTree`) is needed. Usually it is best to first do `model="constant"` and then use `relevance.dynaTree`. Bayes factors (`getBF`) can be used to back up any variable selections implied by the relevance. Then, if desired, one can re-fit on the new (possibly reduced) set of predictors with `model="linear"`.

There are no caveats with `model="class"`

Value

The entire `object` is returned with a new entry called `relevance` containing a `matrix` with `ncol(X)` columns. Each row contains the sample from the relevance of each input, and there is a row for each particle

Author(s)

Robert B. Gramacy [email protected],
Christoforos Anagnostopoulos [email protected]

References

Gramacy, R.B., Taddy, M.A., and S. Wild (2011). “Variable Selection and Sensitivity Analysis via Dynamic Trees with an Application to Computer Code Performance Tuning” arXiv:1108.4739

`dynaTree`, `sens.dynaTree`, `predict.dynaTree` `varpropuse`, `varproptotal`

Examples

 ```1 2 3``` ```## see the examples in sens.dynaTree for the relevances; ## Also see varpropuse and the class2d demo via ## demo("class2d") ```

dynaTree documentation built on May 29, 2017, 10:14 p.m.