linear_gaussian_dgp | R Documentation |
Generate independent normally-distributed covariates (including potentially omitted variables) and linear response data with a specified error distribution.
linear_gaussian_dgp( n, p_obs = 0, p_unobs = 0, s_obs = p_obs, s_unobs = p_unobs, betas = NULL, betas_unobs = NULL, intercept = 0, err = NULL, data_split = FALSE, train_prop = 0.5, return_values = c("X", "y", "support"), ... )
n |
Number of samples. |
p_obs |
Number of observed features. |
p_unobs |
Number of unobserved (omitted) features. |
s_obs |
Sparsity level of observed features. Coefficients corresponding
to features after the |
s_unobs |
Sparsity level of unobserved (omitted) features. Coefficients
corresponding to features after the |
betas |
Coefficient vector for observed design matrix. If a scalar is provided, the coefficient vector is constant. If |
betas_unobs |
Coefficient vector for unobserved design matrix. If a scalar is provided, the coefficient vector is constant. If |
intercept |
Scalar intercept term. |
err |
Function from which to generate simulated error vector. Default is
|
data_split |
Logical; if |
train_prop |
Proportion of data in training set if |
return_values |
Character vector indicating what objects to return in list. Elements in vector must be one of "X", "y", "support". |
... |
Additional arguments to pass to functions that generate X, U, y, betas, betas_unobs, and err. If the argument doesn't exist in one of the functions it is ignored. If two or more of the functions have an argument of the same name but with different values, then use one of the following prefixes in front of the argument name (passed via |
Data is generated via:
y = intercept + betas \%\emph{\% X + betas_unobs \%}\% U + err(...),
where X, U are standard Gaussian random matrices and the true underlying support of this data is the first s_obs and s_unobs features in X and U respectively.
A list of the named objects that were requested in
return_values
. See brief descriptions below.
A data.frame
.
A response vector of length nrow(X)
.
A vector of feature indices indicating all features used in the true support of the DGP.
Note that if data_split = TRUE
and "X", "y"
are in return_values
, then the returned list also contains slots for
"Xtest" and "ytest".
# generate data from: y = betas_1 * x_1 + betas_2 * x_2 + N(0, 0.5), where # betas_1, betas_2 ~ N(0, 1) and X ~ N(0, I_10) sim_data <- linear_gaussian_dgp(n = 100, p_obs = 10, s_obs = 2, betas_sd = 1, err = rnorm, sd = .5) # generate data from y = betas %*% X - u_1 + t(df = 1), where # betas ~ N(0, .5), betas_unobs = [-1, 0], X ~ N(0, I_10), U ~ N(0, I_2) sim_data <- linear_gaussian_dgp(n = 100, p_obs = 10, p_unobs = 2, betas_sd = .5, betas_unobs = c(-1, 0), err = rt, df = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.