Calculate the ALC or predictive entropy statistic at the X locations, or ALC at new XX predictive locations

Share:

Description

Uses analytic integration (at the leaves) to calculate the (regression) ALC statistic, or calculates the predictive (class) entropy at the input (X) locations; or calculate ALC at new predictive locations either analytically or numerically

Usage

1
2
3
4
5
6
7
8
## S3 method for class 'dynaTree'
alcX(object, rect = NULL, categ = NULL,
     approx = FALSE, verb = 0)
## S3 method for class 'dynaTree'
entropyX(object, verb = 0)
## S3 method for class 'dynaTree'
alc(object, XX, rect = NULL, categ = NULL,
     approx = FALSE, Xref = NULL, probs = NULL, verb = 0)

Arguments

object

a "dynaTree"-class object built by dynaTree

rect

for alcX, a matrix with two columns and ncol(object$X) rows describing the bounding rectangle for the ALC integration; the default that is used when rect = NULL is the bounding rectangle obtained by applying range to each column of object$X (taking care to remove the first/intercept column of object$X if icept = "augmented"; only applies to regression models (object$model != "class"); for alc, rect must be a scalar logical: see Xref below

categ

A vector of logicals of length ncol(object$X) indicating which, if any, dimensions of the input space should be treated as categorical; this input is used to help with the analytic integrals from a rect-based calculation, which means it should not specified along with Xref; the default categ argument is NULL meaning that the categorical inputs are derived from object$X in a sensible way

approx

a scalar logical that, when TRUE, causes the number of data points in a node/leaf to be used as a proxy for its area in the analytic calculations

XX

a design matrix of predictive locations (where ncol(XX) == ncol(X); only used by alc

Xref

Xref input can be optionally used to specify a grid of reference locations for the numerical ALC calculation - a matrix with ncol(X) columns. If NULL, the default, then the XX is taken as both candidate and reference locations.

probs

weights for the reference locations to be used in a Monte Carlo approximation; usually these weights are class probabilities for response surfaces under constraints

verb

a positive scalar integer indicating how many predictive locations (iterations) after which a progress statement should be printed to the console; a (default) value of verb = 0 is quiet

Details

This function is most useful for selecting object$X locations to remove from the analysis, perhaps in an online inference setting. See retire.dynaTree for more details. The output is the same as using predict.dynaTree using XX = object$X, alc = "rect", and Xref = rect

entropyX only apples to classification models (object$model != "class"), and alcX applies (only) to the other, regression, models

The alc function is more generic and allows ALC calculations at new, predictive, XX locations. This functionality used to be part of the predict.dynaTree function, but were separated out for computational reasons. The previous version was Monte Carlo-based (using Xref) whereas the new version also allows analytic calculation (now the default, via rect)

Value

The entire object is returned with a new entry called alcX containing a vector of length nrow(X) with the ALC values, or entropyX containing the entropy values, or alc if general ALC calculations at new XX locations

Author(s)

Robert B. Gramacy rbgramacy@chicagobooth.edu,
Matt Taddy taddy@chicagobooth.edu, and
Christoforos Anagnostopoulos christoforos.anagnostopoulos06@imperial.ac.uk

References

Taddy, M.A., Gramacy, R.B., and Polson, N. (2011). “Dynamic trees for learning and design” Journal of the American Statistical Association, 106(493), pp. 109-123; arXiv:0912.1586

Anagnostopoulos, C., Gramacy. R.B. (2013) “Information-Theoretic Data Discarding for Dynamic Trees on Data Streams.” Entropy, 15(12), 5510-5535; arXiv:1201.5568

http://bobby.gramacy.com/r_packages/dynaTree/

See Also

dynaTree, predict.dynaTree, and retire.dynaTree

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## fit the model to the parabola data
n <- 100
Xp <- runif(n,-3,3)
Yp <- Xp + Xp^2 + rnorm(n, 0, .2)
rect <- c(-3,3)
out <- dynaTree(Xp, Yp, model="linear", icept="augmented")

## calculate the alcX
out <- alcX(out, rect=rect)

## to compare to analytic 
out <- alc(out, XX=out$X[,-1], rect=rect)

## plot comparison between alcX and predict-ALC
plot(out$X[,-1], out$alcX)
o <- order(out$X[,2])
lines(out$X[o,-1], out$alc[o], col=2, lty=2)

## now compare to approximate analytic
## (which writes over out$alc)
out <- alc(out, XX=out$X[,-1], rect=rect, approx=TRUE)
lines(out$X[o,-1], out$alc[o], col=3, lty=3)

## clean up
deletecloud(out)

## similarly with entropyX for classification models

## see demo("design") for more iterations and
## design under other active learning heuristics
## like ALC, and EI for optimization; also see
## demo("online") for an online learning example where
## ALC is used for retirement