friedman.1.data: First Friedman Dataset and a variation

friedman.1.dataR Documentation

First Friedman Dataset and a variation

Description

Generate X and Y values from the 10-dim “first” Friedman data set used to validate the Multivariate Adaptive Regression Splines (MARS) model, and a variation involving boolean indicators. This test function has three non-linear and interacting variables, along with two linear, and five which are irrelevant. The version with indicators has parts of the response turned on based on the setting of the indicators

Usage

friedman.1.data(n = 100)
fried.bool(n = 100)

Arguments

n

Number of samples desired

Details

In the original formulation, as implemented by friedman.1.data the function has 10-dim inputs X are drawn from Unif(0,1), and responses are N(m(X),1) where m(\mathbf{x}) = E[f(\mathbf{x})] and

m(\mathbf{x}) = 10\sin(\pi x_1 x_2) + 20(x_3-0.5)^2 + 10x_4 + 5x_5

The variation fried.bool uses indicators I\in \{1,2,3,4\}. The function also has 10-dim inputs X with columns distributed as Unif(0,1) and responses are N(m(\mathbf{x},I), 1) where m(\mathbf{x},I) = E(f(\mathbf{x},I) and

m(\mathbf{x},I) = f_1(\mathbf{x})_{[I=1]} + f_2(\mathbf{x})_{[I=2]} + f_3(\mathbf{x})_{[I=3]} + m([x_{10},\cdots,x_1])_{[I=4]}

where

f_1(\mathbf{x}) = 10\sin(\pi x_1 x_2), \; f_2(\mathbf{x}) = 20(x_3-0.5)^2, \; \mbox{and } f_3(\mathbf{x}) = 10x_4 + 5x_5.

The indicator I is coded in binary in the output data frame as: c(0,0,0) for I=1, c(0,0,1) for I=2, c(0,1,0) for I=3, and c(1,0,0) for I=4.

Value

Output is a data.frame with columns

X.1, ..., X.10

describing the 10-d randomly sampled inputs

I.1, ..., I.3

boolean version of the indicators provided only for fried.bool, as described above

Y

sample responses (with N(0,1) noise)

Ytrue

true responses (without noise)

Note

An example using the original version of the data (friedman.1.data) is contained in the first package vignette: vignette("tgp"). The boolean version fried.bool is used in second vignette vignette("tgp2")

Author(s)

Robert B. Gramacy, rbg@vt.edu, and Matt Taddy, mataddy@amazon.com

References

Gramacy, R. B. (2007). tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models. Journal of Statistical Software, 19(9). https://www.jstatsoft.org/v19/i09 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v019.i09")}

Robert B. Gramacy, Matthew Taddy (2010). Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models. Journal of Statistical Software, 33(6), 1–48. https://www.jstatsoft.org/v33/i06/. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i06")}

Friedman, J. H. (1991). Multivariate adaptive regression splines. “Annals of Statistics”, 19, No. 1, 1–67.

Gramacy, R. B., Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483), pp. 1119-1130. Also available as ArXiv article 0710.4536 https://arxiv.org/abs/0710.4536

Chipman, H., George, E., & McCulloch, R. (2002). Bayesian treed models. Machine Learning, 48, 303–324.

https://bobby.gramacy.com/r_packages/tgp/

See Also

bgpllm, btlm, blm, bgp, btgpllm, bgp


tgp documentation built on Sept. 11, 2024, 8:22 p.m.