# alcX.dynaTree: Calculate the ALC or predictive entropy statistic at the X... In dynaTree: Dynamic Trees for Learning and Design

## Description

Uses analytic integration (at the leaves) to calculate the (regression) ALC statistic, or calculates the predictive (class) entropy at the input (X) locations; or calculate ALC at new predictive locations either analytically or numerically

## Usage

 ```1 2 3 4 5 6 7 8``` ```## S3 method for class 'dynaTree' alcX(object, rect = NULL, categ = NULL, approx = FALSE, verb = 0) ## S3 method for class 'dynaTree' entropyX(object, verb = 0) ## S3 method for class 'dynaTree' alc(object, XX, rect = NULL, categ = NULL, approx = FALSE, Xref = NULL, probs = NULL, verb = 0) ```

## Arguments

 `object` a `"dynaTree"`-class object built by `dynaTree` `rect` for `alcX`, a `matrix` with two columns and `ncol(object\$X)` rows describing the bounding rectangle for the ALC integration; the default that is used when `rect = NULL` is the bounding rectangle obtained by applying `range` to each column of `object\$X` (taking care to remove the first/intercept column of `object\$X` if ```icept = "augmented"```; only applies to regression models (`object\$model != "class"`); for `alc`, `rect` must be a scalar logical: see `Xref` below `categ` A vector of logicals of length `ncol(object\$X)` indicating which, if any, dimensions of the input space should be treated as categorical; this input is used to help with the analytic integrals from a `rect`-based calculation, which means it should not specified along with `Xref`; the default `categ` argument is `NULL` meaning that the categorical inputs are derived from `object\$X` in a sensible way `approx` a scalar logical that, when `TRUE`, causes the number of data points in a node/leaf to be used as a proxy for its area in the analytic calculations `XX` a design `matrix` of predictive locations (where ```ncol(XX) == ncol(X)```; only used by `alc` `Xref` `Xref` input can be optionally used to specify a grid of reference locations for the numerical ALC calculation - a `matrix` with `ncol(X)` columns. If `NULL`, the default, then the `XX` is taken as both candidate and reference locations. `probs` weights for the reference locations to be used in a Monte Carlo approximation; usually these weights are class probabilities for response surfaces under constraints `verb` a positive scalar integer indicating how many predictive locations (iterations) after which a progress statement should be printed to the console; a (default) value of `verb = 0` is quiet

## Details

This function is most useful for selecting `object\$X` locations to remove from the analysis, perhaps in an online inference setting. See `retire.dynaTree` for more details. The output is the same as using `predict.dynaTree` using `XX = object\$X`, `alc = "rect"`, and ```Xref = rect```

`entropyX` only apples to classification models (`object\$model != "class"`), and `alcX` applies (only) to the other, regression, models

The `alc` function is more generic and allows ALC calculations at new, predictive, `XX` locations. This functionality used to be part of the `predict.dynaTree` function, but were separated out for computational reasons. The previous version was Monte Carlo-based (using `Xref`) whereas the new version also allows analytic calculation (now the default, via `rect`)

## Value

The entire `object` is returned with a new entry called `alcX` containing a vector of length `nrow(X)` with the ALC values, or `entropyX` containing the entropy values, or `alc` if general ALC calculations at new `XX` locations

## Author(s)

Robert B. Gramacy [email protected],
Christoforos Anagnostopoulos [email protected]

## References

Taddy, M.A., Gramacy, R.B., and Polson, N. (2011). “Dynamic trees for learning and design” Journal of the American Statistical Association, 106(493), pp. 109-123; arXiv:0912.1586

Anagnostopoulos, C., Gramacy. R.B. (2013) “Information-Theoretic Data Discarding for Dynamic Trees on Data Streams.” Entropy, 15(12), 5510-5535; arXiv:1201.5568

`dynaTree`, `predict.dynaTree`, and `retire.dynaTree`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```## fit the model to the parabola data n <- 100 Xp <- runif(n,-3,3) Yp <- Xp + Xp^2 + rnorm(n, 0, .2) rect <- c(-3,3) out <- dynaTree(Xp, Yp, model="linear", icept="augmented") ## calculate the alcX out <- alcX(out, rect=rect) ## to compare to analytic out <- alc(out, XX=out\$X[,-1], rect=rect) ## plot comparison between alcX and predict-ALC plot(out\$X[,-1], out\$alcX) o <- order(out\$X[,2]) lines(out\$X[o,-1], out\$alc[o], col=2, lty=2) ## now compare to approximate analytic ## (which writes over out\$alc) out <- alc(out, XX=out\$X[,-1], rect=rect, approx=TRUE) lines(out\$X[o,-1], out\$alc[o], col=3, lty=3) ## clean up deletecloud(out) ## similarly with entropyX for classification models ## see demo("design") for more iterations and ## design under other active learning heuristics ## like ALC, and EI for optimization; also see ## demo("online") for an online learning example where ## ALC is used for retirement ```