The function sample.bin
generates a random sample with p predictors X, a binary
response Y, and n observations, through a logistic model, where the response Y is generated as
a Bernoulli random variable of parameter logit^{1}(XB), the coefficients B are sparse, and
the design matrix X is composed of correlated blocks of predictors.
1  sample.bin(n, p, kstar, lstar, beta.min, beta.max, mean.H=0, sigma.H, sigma.F, seed=NULL)

n 
the number of observations in the sample. 
p 
the number of covariates in the sample. 
kstar 
the number of underlying latent variables used to generates the design matrix

lstar 
the number of blocks in the design matrix 
beta.min 
the inf bound for non null coefficients (see details). 
beta.max 
the sup bound for non null coefficients (see details). 
mean.H 
the mean of latent variables used to generates 
sigma.H 
the standard deviation of latent variables used to generates 
sigma.F 
the standard deviation of the noise added to latent variables used to generates

seed 
an positive integer, if non NULL it fix the seed (with the command

The set (1:p) of predictors is partitioned into kstar block. Each block k (k=1,...,kstar) depends on a latent variable H.k which are independent and identically distributed following a distribution N(mean.H, sigma.H^2). Each columns X.j of the matrix X is generated as H.k + F.j for j in the block k, where F.j is independent and identically distributed gaussian noise N(0,sigma.F^2).
The coefficients B are generated as random between beta.min and beta.max on lstar blocks, randomly chosen, and null otherwise. The variables with non null coefficients are then relevant to explain the response, whereas the ones with null coefficients are not.
The response is generated as by a drawing a Bernoulli random variable of parameter logit^{1}(XB).
The details of the procedure are developped by Durif et al. (2015).
A list with the following components:
X 
the (n x p) design matrix, containing the 
Y 
the (n) vector of Y observations. 
proba 
the n vector of Bernoulli parameters used to generate the response, in particular

sel 
the index in (1:p) of covariates with non null coefficients in 
nosel 
the index in (1:p) of covariates with null coefficients in 
B 
the (n) vector of coefficients. 
block.partition 
a (p) vector indicating the block of each predictors in (1:kstar). 
p 
the number of covariates in the sample. 
kstar 
the number of underlying latent variables used to generates the design matrix

lstar 
the number of blocks in the design matrix 
p0 
the number of predictors with non null coefficients in 
block.sel 
a (lstar) vector indicating the index in (1:kstar) of blocks with predictors
having non null coefficient in 
beta.min 
the inf bound for non null coefficients (see details). 
beta.max 
the sup bound for non null coefficients (see details). 
mean.H 
the mean of latent variables used to generates 
sigma.H 
the standard deviation of latent variables used to generates 
sigma.F 
the standard deviation of the noise added to latent variables used to
generates 
seed 
an positive integer, if non NULL it fix the seed (with the command

Ghislain Durif (http://lbbe.univlyon1.fr/DurifGhislain.html).
G. Durif, F. Picard, S. LambertLacroix (2015). Adaptive sparse PLS for logistic regression, (in prep), available on (http://arxiv.org/).
1 2 3 4 5 6 7 8 9  ### load plsgenomics library
library(plsgenomics)
### generating data
n < 100
p < 1000
sample1 < sample.bin(n=n, p=p, kstar=20, lstar=2, beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5)
str(sample1)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.