logistic_gaussian_dgp: Generate independent Gaussian covariates and (binary)...
In Yu-Group/dgpoix: Generate synthetic data that is as fresh as the real thing

logistic_gaussian_dgp

R Documentation

Generate independent Gaussian covariates and (binary) logistic response data.

Description

Generate independent normally-distributed covariates and logistic response data.

Usage

logistic_gaussian_dgp(
  n,
  p,
  s = p,
  betas = NULL,
  betas_sd = 1,
  intercept = 0,
  data_split = FALSE,
  train_prop = 0.5,
  return_values = c("X", "y", "support"),
  ...
)

Arguments

`n`	Number of samples.
`p`	Number of features.
`s`	Sparsity level of features. Coefficients corresponding to features after the `s` position (i.e., positions i = `s` + 1, ..., `p`) are set to 0.
`betas`	Coefficient vector for observed design matrix. If a scalar is provided, the coefficient vector is constant. If `NULL` (default), entries in the coefficient vector are drawn iid from N(0, `betas_sd`^2). Can also be a function that generates the coefficient vector; see `generate_coef()`.
`betas_sd`	(Optional) SD of normal distribution from which to draw `betas`. Only used if `betas` argument is `NULL` or is a function in which case `betas_sd` is optionally passed to the function as `sd`; see `generate_coef()`.
`intercept`	Scalar intercept term.
`data_split`	Logical; if `TRUE`, splits data into training and test sets according to `train_prop`.
`train_prop`	Proportion of data in training set if `data_split = TRUE`.
`return_values`	Character vector indicating what objects to return in list. Elements in vector must be one of "X", "y", "support".
`...`	Not used.

Details

Data is generated via:

log(p / (1 - p)) = intercept + betas \%*\% X,

where p = P(y = 1 | X), X is a standard Gaussian random matrix, and the true underlying support of this data is the first s features in X (unless specified otherwise by betas).

Value

A list of the named objects that were requested in return_values. See brief descriptions below.

X: A data.frame.
y: A response vector of length nrow(X).
support: A vector of feature indices indicating all features used in the true support of the DGP.

Note that if data_split = TRUE and "X", "y" are in return_values, then the returned list also contains slots for "Xtest" and "ytest".

Examples

# generate data from: log(p / (1 - p)) = betas_1 * x_1 + betas_2 * x_2, where
# betas_1, betas_2 ~ N(0, 1) and X ~ N(0, I_10)
sim_data <- logistic_gaussian_dgp(n = 100, p = 10, s = 2, betas_sd = 1)

Yu-Group/dgpoix documentation built on June 3, 2022, 1:40 a.m.