generate_y_lss | R Documentation |
Generate LSS response data with a specified error distribution given the observed data matrices.
generate_y_lss( X, k, s, thresholds = 1, signs = 1, betas = 1, intercept = 0, overlap = FALSE, err = NULL, return_support = FALSE, ... )
X |
Data matrix or data frame. |
k |
Order of the interactions. |
s |
Number of interactions in the LSS model or a matrix of the support indices with each interaction taking a row in this matrix and ncol = k. |
thresholds |
A scalar or a s x k matrix of the thresholds for each term in the LSS model. |
signs |
A scalar or a s x k matrix of the sign of each interaction (1 means > while -1 means <). |
betas |
Scalar, vector, or function to generate coefficients corresponding to interaction terms. See \codegenerate_coef(). |
intercept |
Scalar intercept term. |
overlap |
If TRUE, simulate support indices with replacement; if FALSE, simulate support indices without replacement (so no overlap) |
err |
Function from which to generate simulated error vector. Default is
|
return_support |
Logical specifying whether or not to return a vector of
the support column names. If |
... |
Other arguments to pass to err() to generate the error vector. |
Here, data is generated from the following LSS model:
E(Y|X) = intercept + sum_{i = 1}^{s} beta_i prod_{j = 1}^{k}1(X_{S_j} lessgtr thresholds_ij)
For more details on the LSS model, see Behr, Merle, et al. "Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests." arXiv preprint arXiv:2102.11800 (2021).
If return_support = TRUE
, returns a list of three:
A response vector of length nrow(X)
.
A vector of feature indices indicating all features used in the true support of the DGP.
A vector of signed feature indices in the true (interaction) support of the DGP. For example, "1+_2-" means that the interaction between high values of feature 1 and low values of feature 2 appears in the underlying DGP.
If return_support = FALSE
, returns only the response vector y
.
X <- generate_X_gaussian(.n = 100, .p = 10) # generate data from: y = 1(X_1 > 0, X_2 > 0) + 1(X_3 > 0, X_4 > 0) y <- generate_y_lss(X = X, k = 2, s = matrix(1:4, nrow = 2, byrow = TRUE), thresholds = 0, signs = 1, betas = 1) # generate data from: y = 3 * 1(X_1 < 0) - 1(X_2 > 1) + N(0, 1) y <- generate_y_lss(X = X, k = 1, s = matrix(1:2, nrow = 2), thresholds = matrix(0:1, nrow = 2), signs = matrix(c(-1, 1), nrow = 2), betas = c(3, -1), err = rnorm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.