xy_dgp_constructor | R Documentation |
A general DGP constructor function that generates X and y data for any supervised learning DGP, provided the functions for simulating X, y, and the additive error term.
xy_dgp_constructor( n, X_fun, y_fun, err_fun = NULL, add_err = TRUE, data_split = FALSE, train_prop = 0.5, return_values = c("X", "y", "support"), ... )
n |
Number of samples. |
X_fun |
Function to generate X data. Must take an argument |
y_fun |
Function to generate y data. Must take an argument |
err_fun |
Function to generate error/noise data. Default |
add_err |
Logical. If |
data_split |
Logical; if |
train_prop |
Proportion of data in training set if |
return_values |
Character vector indicating what objects to return in list. Elements in vector must be one of "X", "y", "support". |
... |
Additional arguments to pass to functions that generate X, y, and err. If the argument doesn't exist in one of the functions it is ignored. If two or more of the functions have an argument of the same name but with different values, then use one of the following prefixes in front of the argument name (passed via |
If add_err = TRUE
, data is generated from the following
additive model:
y = y_fun(X, ...) + err_fun(X, y_fun(X), ...), where X = X_fun(...).
If add_err = FALSE
, data is generated via:
y = err_fun(X, y_fun(X, ...), ...), where X = X_fun(...).
Note that while err_fun()
is allowed to depend on both X and y, it is
not necessary that err_fun()
depend on X or y.
A list of the named objects that were requested in
return_values
. See brief descriptions below.
A data.frame
.
A response vector of length nrow(X)
.
A vector of feature indices indicating all features used in the true support of the DGP.
Note that if data_split = TRUE
and "X", "y"
are in return_values
, then the returned list also contains slots for
"Xtest" and "ytest".
# generate X = 100 x 10 standard Gaussian, y = linear regression model sim_data <- xy_dgp_constructor(X_fun = MASS::mvrnorm, y_fun = generate_y_linear, err_fun = rnorm, data_split = TRUE, # shared dgp arguments n = 100, # arguments specifically for X_fun .X_mu = rep(0, 10), .X_Sigma = diag(10), # arguments specifically for y_fun .y_betas = rnorm(10), .y_return_support = TRUE, # arguments specifically for err_fun .err_sd = 1) # or alternatively, (since arguments of X_fun, y_fun, err_fun are unique, # with the exception of `n`) sim_data <- xy_dgp_constructor(X_fun = MASS::mvrnorm, y_fun = generate_y_linear, err_fun = rnorm, data_split = TRUE, # shared dgp arguments n = 100, # arguments specifically for X_fun mu = rep(0, 10), Sigma = diag(10), # arguments specifically for y_fun betas = rnorm(10), return_support = TRUE, # arguments specifically for err_fun sd = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.