sens.dynaTree: Monte Carlo Sensitivity Analysis for dynaTree Models
In dynaTree: Dynamic Trees for Learning and Design

sens.dynaTree

R Documentation

Monte Carlo Sensitivity Analysis for dynaTree Models

Description

A Monte Carlo sensitivity analysis using random Latin hypercube samples (LHSs) or bootstrap resamples for each particle to estimate main effects as well as 1st order and total sensitivity indices

Usage

## S3 method for class 'dynaTree'
sens(object, class = NULL, nns = 1000, nME = 100,
              span = 0.3, method = c("lhs", "boot"),
              lhs = NULL, categ = NULL, verb = 0)

Arguments

`object`	a `"dynaTree"`-class object built by `dynaTree`
`class`	only valid for `object$model = "class"`, allows the user to specify the subset of class labels in `unique(object$y)` for which sensitivity indices are calculated. The implementation loops over the vector of labels provided. The default of `NULL` results in `class = unique(object$y)`
`nns`	A positive integer scalar indicating the size of each LHS or bootstrap drawn for use in the Monte Carlo integration scheme underlying the sensitivity analysis; the total number of locations is `nn.lhs*(ncol(X)+2)`
`nME`	A positive integer scalar indicating number of grid points, in each input dimension, upon which main effects will be estimated
`span`	A positive real-valued scalar giving the smoothing parameter for main effects integration: the fraction of `nns` points that will be included in a moving average window that is used to estimate main effects at the `nME` locations in each input dimension
`method`	indicates whether LHS or bootstrap should be used
`lhs`	if `method = "lhs"` then this argument should be a `list` with entries `rect`, `shape` and `mode` describing the marginal distributions of the Latin Hypercube; specify `NULL` for a default specification for `method = "boot"`. The fields should have the following format(s): `rect`: Optional rectangle describing the domain of the uncertainty distribution with respect to which the sensitivity is to be determined. This defines the domain from which the LH sample is to be taken. The rectangle should be a `matrix` or `data.frame` with `ncol(rect) = 2`, and number of rows equal to the dimension of the domain. For 1-d data, a vector of length 2 is allowed. The default is the input data range of each column of (`object$X`). `shape`: Optional vector of shape parameters for Beta marginal distributions having length `ncol(object$X)` and elements `> 1`, i.e., concave Beta distributions. If specified, the uncertainty distribution (i.e. the LHS) is proportional to a joint pdf formed by independent Beta distributions in each dimension of the domain, scaled and shifted to have support defined by `rect`. If unspecified, the uncertainty distribution is uniform over `rect`. The specification `shape[i]=0` instructs `sens` to treat the i'th dimension as a binary variable. In this case, `mode[i]` is the probability parameter for a bernoulli uncertainty distribution, and we must also have `rect[i,]=c(0,1)`. `mode`: Optional vector of mode values for the Beta uncertainty distribution. Vector of length equal to the dimension of the domain, with elements within the support defined by `rect`. If `shape` is specified, but this is not, then the scaled Beta distributions will be symmetric.
`categ`	A vector of logicals of length `ncol(object$X)` indicating which, if any, dimensions of the input space should be treated as categorical; this input is used to help set the default `lhs$shape` argument if not specified; the default `categ` argument is `NULL` meaning that the categorical inputs are derived from `object$X` in a sensible way
`verb`	a positive scalar integer indicating how many predictive locations (iterations) after which a progress statement should be printed to the console; a (default) value of `verb = 0` is quiet

Details

Saltelli (2002) describes a Latin Hypercube sampling based method for estimation of the 'Sobol' sensitivity indices:

1st Order for input i,

S(i) = \mbox{Var}(E[f|x_i])/\mbox{Var}(f),

where x_i is the i-th input.

Total Effect for input i,

T(i) = E[\mbox{Var}(f|x_{-i})]/\mbox{Var}(f),

where x_{-i} is all inputs except for the i-th.

All moments are with respect to the appropriate marginals of the uncertainty distribution U – that is, the probability distribution on the inputs with respect to which sensitivity is being investigated. Under this approach, the integrals involved are approximated through averages over properly chosen samples based on two LH samples proportional to U. If nns is the sample size for the Monte Carlo estimate, this scheme requires nns*(ncol(X)+2) function evaluations.

The sens.dynaTree function implements the method for unknown functions f, through prediction via one of the tgp regression models conditional on an observed set of X locations. For each particle, treated as sample from the dynaTree model posterior, the nns*(ncol(X)+2) locations are drawn randomly from the LHS scheme and realizations of the sensitivity indices are calculated. Thus we obtain a posterior sample of the indices, incorporating variability from both the Monte Carlo estimation and uncertainty about the function output. Since a subset of the predictive locations are actually an LHS proportional to the uncertainty distribution, we can also estimate the main effects through simple non-parametric regression (a moving average).

See the Gramacy, Taddy, & Wild (2011) reference below for more details.

If method = "boot" is used then simply replace LHS above with a bootstrap resample of the object$X locations.

As with prediction, the dynaTrees function enables repeated calls to sens.dynaTree

Value

The object returned is of class "dynaTree", which includes a copy of the list elements from the object passed in, with the following (sensitivity-analysis specific) additions.

`MEgrid`	An `nME`-by-`ncol(object$X)` matrix containing the main effects predictive grid at which the following `MEmean`, `MEq1`, and `MEq2` quantities were obtained
`MEmean`	A `matrix` with `ncol(object$X)` columns and `nME` rows containing the mean main effects for each input dimension
`MEq1`	same as `MEmean` but containing the 5% quantiles
`MEq2`	same as `MEmean` but containing the 95% quantiles
`S`	An `object$N`-row and `ncol(object$X)` `matrix` containing the posterior (samples) of the 1st Order Sobol sensitivity indices
`T`	same as `S` but containing the Total Effect indices

In the case of object$model = "class" the entries listed above will themselves be lists with an entry for each class specified on input, or all classes as is the default

Note

The quality of sensitivity analysis is dependent on the size of the LHSs used for integral approximation; as with any Monte Carlo integration scheme, the sample size (nns) must increase with the dimensionality of the problem. The total sensitivity indices T are forced non-negative, and if negative values occur it is necessary to increase nnd. Postprocessing replaces negative values with NA

Author(s)

Robert B. Gramacy rbg@vt.edu,
Matt Taddy and Christoforos Anagnostopoulos

References

Saltelli, A. (2002) Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications, 145, 280-297.

Gramacy, R.B., Taddy, M.A., and S. Wild (2011). “Variable Selection and Sensitivity Analysis via Dynamic Trees with an Application to Computer Code Performance Tuning” arXiv:1108.4739

https://bobby.gramacy.com/r_packages/dynaTree/

Examples

## friedman data
if(require("tgp")) {
  f <- friedman.1.data(1000)
  X <- f[,1:6]
  Z <- f$Y
  
  ## fit the model and do the sensitivity analysis
  N <- 100 ## use N >= 1000 for better results
  ## small N is for fast CRAN checks
  out <- dynaTree(X=X, y=Z, N=N, ab=c(0.01,2))
  ## also try with model="linear"
  
  ## gather relevance statistics
  out <- relevance(out)
  boxplot(out$relevance)
  abline(h=0, col=2, lty=2)
  ## relevance stats are not as useful when model="linear"
  ## since it will appear that x4 and x5 not helpful; these
  ## interact linearly with the response
  
  ## full simulation-based sensitivity analysis, the dynaTree::
  ## part is only needed if the tgp package is loaded
  out <- dynaTree::sens(out, verb=100)
  
  ## plot the main effects
  r <- range(rbind(c(out$MEmean, out$MEq1, out$MEq2)))
  par(mfrow=c(1,ncol(out$X)), mar=c(5,3,2,1))
  plot(out$MEgrid[,1], out$MEmean[,1], type="l", ylim=r, lwd=2,
       ylab="", xlab=colnames(out$MEmean)[1])
  lines(out$MEgrid[,1], out$MEq1[,1], lty=2, lwd=2)
  lines(out$MEgrid[,1], out$MEq2[,1], lty=2, lwd=2)
  if(ncol(out$X) > 1) {
    for(d in 2:ncol(out$X)) {
      plot(out$MEgrid[,d], out$MEmean[,d], col=d, type="l", ylim=r,
           lwd=2, xlab=colnames(out$MEmean)[d], ylab="")
      lines(out$MEgrid[,d], out$MEq1[,d], col=d, lty=2)
      lines(out$MEgrid[,d], out$MEq2[,d], col=d, lty=2)
    }
  }
  
  ## Sobol indices
  par(mfrow=c(1,2), mar=c(5,4,4,2))
  boxplot(out$S, main="first order indices", xlab="inputs")
  boxplot(out$T, main="total indices", xlab="inputs")
  ## these look better when model="linear"
  
  ## clean up
  deletecloud(out)
  
  ## for a classification example using the sensitivity hooks
  ## in the dynaTrees function,  see the class2d demo
  ## i.e., demo("class2d")
  }

dynaTree documentation built on Aug. 23, 2023, 9:07 a.m.