GenSyntheticLogistic: Generate Logistic Synthetic Data

View source: R/genlogistic.R

GenSyntheticLogisticR Documentation

Generate Logistic Synthetic Data

Description

Generates a synthetic dataset as follows: 1) Generate a data matrix, X, drawn from a multivariate Gaussian distribution with mean = 0, sigma = Sigma 2) Generate a vector B with k entries set to 1 and the rest are zeros. 3) Every coordinate yi of the outcome vector y exists in -1, 1^n is sampled independently from a Bernoulli distribution with success probability: P(yi = 1|xi) = 1/(1 + exp(-s<xi, B>)) Source https://arxiv.org/pdf/2001.06471.pdf Section 5.1 Data Generation

Usage

GenSyntheticLogistic(
  n,
  p,
  k,
  seed,
  rho = 0,
  s = 1,
  sigma = NULL,
  shuffle_B = FALSE
)

Arguments

n

Number of samples

p

Number of features

k

Number of non-zeros in true vector of coefficients

seed

The seed used for randomly generating the data

rho

The threshold for setting values to 0. if |X(i, j)| > rho => X(i, j) <- 0

s

Signal-to-noise parameter. As s -> +Inf, the data generated becomes linearly separable.

sigma

Correlation matrix, defaults to I.

shuffle_B

A boolean flag for whether or not to randomly shuffle the Beta vector, B. If FALSE, the first k entries in B are set to 1.

Value

A list containing: the data matrix X, the response vector y, the coefficients B.


L0Learn documentation built on March 7, 2023, 8:18 p.m.