Description Usage Arguments Value Examples
View source: R/mixture_generator.R
Generates a dataset (with an additional validation sample) made of Gaussian mixtures with some of them generated by sub-regressions on others. A response variable is then added by linear regression. This function is used to generate datasets for simulations using CorReg, or just with Gaussian Mitures.
| 1 2 3 4 5 6 | mixture_generator(n = 130, p = 100, ratio = 0.4, max_compl = 1,
  valid = 1000, positive = 0.6, sigma_Y = 10, sigma_X = NULL,
  R2 = NULL, R2Y = 0.4, meanvar = NULL, sigmavar = NULL, lambda = 3,
  Amax = NULL, lambdapois = 10, gamma = FALSE, gammashape = 1,
  gammascale = 0.5, tp1 = 1, tp2 = 1, tp3 = 1, nonlin = 0,
  pnonlin = 2, scale = TRUE, Z = NULL)
 | 
| n | the number of individuals in the learning dataset | 
| p | the number of covariates (without the response) | 
| ratio | the ratio of covariates generated by sub-regressions on others | 
| max_compl | the number of covariates in each sub-regression | 
| valid | the number of individuals in the validation sample | 
| positive | the ratio of positive coefficients in both the regression and the sub-regressions | 
| sigma_Y | the standard deviation for the noise of the regression | 
| sigma_X | the standard deviation for the noise of the sub-regressions (all). ignored if  | 
| R2 | the strength of the sub-regressions (coefficients will be chosen to obtain this value). | 
| R2Y | the strength of the main regression (coefficients will be chosen to obtain this value). | 
| meanvar | vector of means for the covariates. | 
| sigmavar | standard deviation of the covariates. | 
| lambda | parameter of the Poisson's law that defines the number of components in Gaussian Mixture models | 
| Amax | the maximum number of covariates with non-zero coefficients in the regression | 
| lambdapois | parameter used to generate the coefficient in the subregressions. Poisson's distribution. | 
| gamma | (boolean) to generate a p-sized vector  | 
| gammashape | shape parameter of the gamma distribution (if needed) | 
| gammascale | scale parameter of the gamma distribution (if needed) | 
| tp1 | the ratio of right-side (explicative) covariates allowed to have a non-zero coefficient in the regression | 
| tp2 | the ratio of left-side (redundant) covariates allowed to have a non-zero coefficient in the regression | 
| tp3 | the ratio of strictly independent covariates allowed to have a non-zero coefficient in the regression | 
| nonlin | to use non linear structure (squared or log). If not null, it is the proba to use power pnonlin instead of log. The type is drawn for each link between covariates | 
| pnonlin | the power used if non linear structure | 
| scale | (boolean) to scale X before computing Y | 
| Z | the binary squared adjacency matrix (size p) to obtain. If NULL it is randomly generated, based on  | 
a list that contains:
| X_appr | matrix of the learning set.  | 
| Y_appr | Response variable vector (size  | 
| A | vector of the of the regression generating  | 
| B | Matrix of the coefficients of sub-regressions (first line : the intercepts) then  | 
| Z | Binary squared adjacency matrix of size  | 
| X_test | validation sample generated the same way as  | 
| Y_test | Response vector associated to the validation sample | 
| sigma_X | Vector of the standard deviations of the residuals of the sub-regressions (one value for each sub-regression) | 
| sigma_Y | Standard deviation of the residual of the regression that generates  | 
| nbcomp | vector of the number of components for covariates that are not explained by others. | 
| 1 2 3 4 5 6 7 8 9 10 11 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.