sens.dynaTree: Monte Carlo Sensitivity Analysis for dynaTree Models

View source: R/predict.R

sens.dynaTreeR Documentation

Monte Carlo Sensitivity Analysis for dynaTree Models

Description

A Monte Carlo sensitivity analysis using random Latin hypercube samples (LHSs) or bootstrap resamples for each particle to estimate main effects as well as 1st order and total sensitivity indices

Usage

## S3 method for class 'dynaTree'
sens(object, class = NULL, nns = 1000, nME = 100,
              span = 0.3, method = c("lhs", "boot"),
              lhs = NULL, categ = NULL, verb = 0)

Arguments

object

a "dynaTree"-class object built by dynaTree

class

only valid for object$model = "class", allows the user to specify the subset of class labels in unique(object$y) for which sensitivity indices are calculated. The implementation loops over the vector of labels provided. The default of NULL results in class = unique(object$y)

nns

A positive integer scalar indicating the size of each LHS or bootstrap drawn for use in the Monte Carlo integration scheme underlying the sensitivity analysis; the total number of locations is nn.lhs*(ncol(X)+2)

nME

A positive integer scalar indicating number of grid points, in each input dimension, upon which main effects will be estimated

span

A positive real-valued scalar giving the smoothing parameter for main effects integration: the fraction of nns points that will be included in a moving average window that is used to estimate main effects at the nME locations in each input dimension

method

indicates whether LHS or bootstrap should be used

lhs

if method = "lhs" then this argument should be a list with entries rect, shape and mode describing the marginal distributions of the Latin Hypercube; specify NULL for a default specification for method = "boot". The fields should have the following format(s):

  • rect: Optional rectangle describing the domain of the uncertainty distribution with respect to which the sensitivity is to be determined. This defines the domain from which the LH sample is to be taken. The rectangle should be a matrix or data.frame with ncol(rect) = 2, and number of rows equal to the dimension of the domain. For 1-d data, a vector of length 2 is allowed. The default is the input data range of each column of (object$X).

  • shape: Optional vector of shape parameters for Beta marginal distributions having length ncol(object$X) and elements > 1, i.e., concave Beta distributions. If specified, the uncertainty distribution (i.e. the LHS) is proportional to a joint pdf formed by independent Beta distributions in each dimension of the domain, scaled and shifted to have support defined by rect. If unspecified, the uncertainty distribution is uniform over rect. The specification shape[i]=0 instructs sens to treat the i'th dimension as a binary variable. In this case, mode[i] is the probability parameter for a bernoulli uncertainty distribution, and we must also have rect[i,]=c(0,1).

  • mode: Optional vector of mode values for the Beta uncertainty distribution. Vector of length equal to the dimension of the domain, with elements within the support defined by rect. If shape is specified, but this is not, then the scaled Beta distributions will be symmetric.

categ

A vector of logicals of length ncol(object$X) indicating which, if any, dimensions of the input space should be treated as categorical; this input is used to help set the default lhs$shape argument if not specified; the default categ argument is NULL meaning that the categorical inputs are derived from object$X in a sensible way

verb

a positive scalar integer indicating how many predictive locations (iterations) after which a progress statement should be printed to the console; a (default) value of verb = 0 is quiet

Details

Saltelli (2002) describes a Latin Hypercube sampling based method for estimation of the 'Sobol' sensitivity indices:

1st Order for input i,

S(i) = \mbox{Var}(E[f|x_i])/\mbox{Var}(f),

where x_i is the i-th input.

Total Effect for input i,

T(i) = E[\mbox{Var}(f|x_{-i})]/\mbox{Var}(f),

where x_{-i} is all inputs except for the i-th.

All moments are with respect to the appropriate marginals of the uncertainty distribution U – that is, the probability distribution on the inputs with respect to which sensitivity is being investigated. Under this approach, the integrals involved are approximated through averages over properly chosen samples based on two LH samples proportional to U. If nns is the sample size for the Monte Carlo estimate, this scheme requires nns*(ncol(X)+2) function evaluations.

The sens.dynaTree function implements the method for unknown functions f, through prediction via one of the tgp regression models conditional on an observed set of X locations. For each particle, treated as sample from the dynaTree model posterior, the nns*(ncol(X)+2) locations are drawn randomly from the LHS scheme and realizations of the sensitivity indices are calculated. Thus we obtain a posterior sample of the indices, incorporating variability from both the Monte Carlo estimation and uncertainty about the function output. Since a subset of the predictive locations are actually an LHS proportional to the uncertainty distribution, we can also estimate the main effects through simple non-parametric regression (a moving average).

See the Gramacy, Taddy, & Wild (2011) reference below for more details.

If method = "boot" is used then simply replace LHS above with a bootstrap resample of the object$X locations.

As with prediction, the dynaTrees function enables repeated calls to sens.dynaTree

Value

The object returned is of class "dynaTree", which includes a copy of the list elements from the object passed in, with the following (sensitivity-analysis specific) additions.

MEgrid

An nME-by-ncol(object$X) matrix containing the main effects predictive grid at which the following MEmean, MEq1, and MEq2 quantities were obtained

MEmean

A matrix with ncol(object$X) columns and nME rows containing the mean main effects for each input dimension

MEq1

same as MEmean but containing the 5% quantiles

MEq2

same as MEmean but containing the 95% quantiles

S

An object$N-row and ncol(object$X) matrix containing the posterior (samples) of the 1st Order Sobol sensitivity indices

T

same as S but containing the Total Effect indices

In the case of object$model = "class" the entries listed above will themselves be lists with an entry for each class specified on input, or all classes as is the default

Note

The quality of sensitivity analysis is dependent on the size of the LHSs used for integral approximation; as with any Monte Carlo integration scheme, the sample size (nns) must increase with the dimensionality of the problem. The total sensitivity indices T are forced non-negative, and if negative values occur it is necessary to increase nnd. Postprocessing replaces negative values with NA

Author(s)

Robert B. Gramacy rbg@vt.edu,
Matt Taddy and Christoforos Anagnostopoulos

References

Saltelli, A. (2002) Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications, 145, 280-297.

Gramacy, R.B., Taddy, M.A., and S. Wild (2011). “Variable Selection and Sensitivity Analysis via Dynamic Trees with an Application to Computer Code Performance Tuning” arXiv:1108.4739

https://bobby.gramacy.com/r_packages/dynaTree/

See Also

dynaTree, predict.dynaTree, relevance.dynaTree, varpropuse, varproptotal

Examples

## friedman data
if(require("tgp")) {
  f <- friedman.1.data(1000)
  X <- f[,1:6]
  Z <- f$Y
  
  ## fit the model and do the sensitivity analysis
  N <- 100 ## use N >= 1000 for better results
  ## small N is for fast CRAN checks
  out <- dynaTree(X=X, y=Z, N=N, ab=c(0.01,2))
  ## also try with model="linear"
  
  ## gather relevance statistics
  out <- relevance(out)
  boxplot(out$relevance)
  abline(h=0, col=2, lty=2)
  ## relevance stats are not as useful when model="linear"
  ## since it will appear that x4 and x5 not helpful; these
  ## interact linearly with the response
  
  ## full simulation-based sensitivity analysis, the dynaTree::
  ## part is only needed if the tgp package is loaded
  out <- dynaTree::sens(out, verb=100)
  
  ## plot the main effects
  r <- range(rbind(c(out$MEmean, out$MEq1, out$MEq2)))
  par(mfrow=c(1,ncol(out$X)), mar=c(5,3,2,1))
  plot(out$MEgrid[,1], out$MEmean[,1], type="l", ylim=r, lwd=2,
       ylab="", xlab=colnames(out$MEmean)[1])
  lines(out$MEgrid[,1], out$MEq1[,1], lty=2, lwd=2)
  lines(out$MEgrid[,1], out$MEq2[,1], lty=2, lwd=2)
  if(ncol(out$X) > 1) {
    for(d in 2:ncol(out$X)) {
      plot(out$MEgrid[,d], out$MEmean[,d], col=d, type="l", ylim=r,
           lwd=2, xlab=colnames(out$MEmean)[d], ylab="")
      lines(out$MEgrid[,d], out$MEq1[,d], col=d, lty=2)
      lines(out$MEgrid[,d], out$MEq2[,d], col=d, lty=2)
    }
  }
  
  ## Sobol indices
  par(mfrow=c(1,2), mar=c(5,4,4,2))
  boxplot(out$S, main="first order indices", xlab="inputs")
  boxplot(out$T, main="total indices", xlab="inputs")
  ## these look better when model="linear"
  
  ## clean up
  deletecloud(out)
  
  ## for a classification example using the sensitivity hooks
  ## in the dynaTrees function,  see the class2d demo
  ## i.e., demo("class2d")
  }

dynaTree documentation built on Aug. 23, 2023, 9:07 a.m.