correlated_logistic_gaussian_dgp | R Documentation |
Generate normally-distributed covariates that are potentially correlated and (binary) logistic response data.
correlated_logistic_gaussian_dgp( n, p_uncorr, p_corr, s_uncorr = p_uncorr, s_corr = p_corr, corr, betas_uncorr = NULL, betas_corr = NULL, betas_uncorr_sd = 1, betas_corr_sd = 1, intercept = 0, data_split = FALSE, train_prop = 0.5, return_values = c("X", "y", "support"), ... )
n |
Number of samples. |
p_uncorr |
Number of uncorrelated features. |
p_corr |
Number of features in correlated group. |
s_uncorr |
Sparsity level of features in uncorrelated group.
Coefficients corresponding to features after the |
s_corr |
Sparsity level of features in correlated group. Coefficients
corresponding to features after the |
corr |
Correlation between features in correlated group. |
betas_uncorr |
Coefficient vector for uncorrelated features. If a scalar is provided, the coefficient vector is constant. If |
betas_corr |
Coefficient vector for correlated features. If a scalar is provided, the coefficient vector is constant. If |
betas_uncorr_sd |
(Optional) SD of normal distribution from which to draw |
betas_corr_sd |
(Optional) SD of normal distribution from which to draw |
intercept |
Scalar intercept term. |
data_split |
Logical; if |
train_prop |
Proportion of data in training set if |
return_values |
Character vector indicating what objects to return in list. Elements in vector must be one of "X", "y", "support". |
... |
Not used. |
Data is generated via:
log(p / (1 - p)) = intercept + betas_uncorr \%\emph{\% X_uncorr + betas_corr \%}\% X_corr,
where p = P(y = 1 | X), X_uncorr is an (uncorrelated) standard Gaussian random matrix, and X_corr is a correlated Gaussian random matrix with variance 1 and Cor(X_corr_i, X_corr_j) = corr for all i, j. The true underlying support of this data is the first s_uncorr and s_corr features in X_uncorr and X_corr respectively.
A list of the named objects that were requested in
return_values
. See brief descriptions below.
A data.frame
.
A response vector of length nrow(X)
.
A vector of feature indices indicating all features used in the true support of the DGP.
Note that if data_split = TRUE
and "X", "y"
are in return_values
, then the returned list also contains slots for
"Xtest" and "ytest".
# generate data from: log(p / (1 - p)) = betas_corr_1 * x_corr_1 + betas_corr_2 * x_corr_2, # where betas_corr_1, betas_corr_2 ~ N(0, 1), # Var(X_corr_i) = 1, Cor(X_corr_i, X_corr_j) = 0.7 for all i, j = 1, ..., 10 sim_data <- correlated_logistic_gaussian_dgp(n = 100, p_uncorr = 0, p_corr = 10, s_corr = 2, corr = 0.7) # generate data from: log(p / (1 - p)) = betas_uncorr %*% X_uncorr - X_corr_1, # where betas_uncorr ~ N(0, .5), betas_corr = [-1, 0], X_uncorr ~ N(0, I_10), # X_corr ~ N(0, Sigma), Sigma has 1s on diagonals and 0.7 elsewhere. sim_data <- correlated_logistic_gaussian_dgp(n = 100, p_uncorr = 10, p_corr = 2, corr = 0.7, betas_uncorr_sd = 1, betas_corr = c(-1, 0))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.