Description Usage Arguments Value Examples
View source: R/mixture_generator.R
Generates a dataset (with an additional validation sample) made of Gaussian mixtures with some of them generated by sub-regressions on others. A response variable is then added by linear regression. This function is used to generate datasets for simulations using CorReg, or just with Gaussian Mitures.
1 2 3 4 5 6 | mixture_generator(n = 130, p = 100, ratio = 0.4, max_compl = 1,
valid = 1000, positive = 0.6, sigma_Y = 10, sigma_X = NULL,
R2 = NULL, R2Y = 0.4, meanvar = NULL, sigmavar = NULL, lambda = 3,
Amax = NULL, lambdapois = 10, gamma = FALSE, gammashape = 1,
gammascale = 0.5, tp1 = 1, tp2 = 1, tp3 = 1, nonlin = 0,
pnonlin = 2, scale = TRUE, Z = NULL)
|
n |
the number of individuals in the learning dataset |
p |
the number of covariates (without the response) |
ratio |
the ratio of covariates generated by sub-regressions on others |
max_compl |
the number of covariates in each sub-regression |
valid |
the number of individuals in the validation sample |
positive |
the ratio of positive coefficients in both the regression and the sub-regressions |
sigma_Y |
the standard deviation for the noise of the regression |
sigma_X |
the standard deviation for the noise of the sub-regressions (all). ignored if |
R2 |
the strength of the sub-regressions (coefficients will be chosen to obtain this value). |
R2Y |
the strength of the main regression (coefficients will be chosen to obtain this value). |
meanvar |
vector of means for the covariates. |
sigmavar |
standard deviation of the covariates. |
lambda |
parameter of the Poisson's law that defines the number of components in Gaussian Mixture models |
Amax |
the maximum number of covariates with non-zero coefficients in the regression |
lambdapois |
parameter used to generate the coefficient in the subregressions. Poisson's distribution. |
gamma |
(boolean) to generate a p-sized vector |
gammashape |
shape parameter of the gamma distribution (if needed) |
gammascale |
scale parameter of the gamma distribution (if needed) |
tp1 |
the ratio of right-side (explicative) covariates allowed to have a non-zero coefficient in the regression |
tp2 |
the ratio of left-side (redundant) covariates allowed to have a non-zero coefficient in the regression |
tp3 |
the ratio of strictly independent covariates allowed to have a non-zero coefficient in the regression |
nonlin |
to use non linear structure (squared or log). If not null, it is the proba to use power pnonlin instead of log. The type is drawn for each link between covariates |
pnonlin |
the power used if non linear structure |
scale |
(boolean) to scale X before computing Y |
Z |
the binary squared adjacency matrix (size p) to obtain. If NULL it is randomly generated, based on |
a list that contains:
X_appr |
matrix of the learning set. |
Y_appr |
Response variable vector (size |
A |
vector of the of the regression generating |
B |
Matrix of the coefficients of sub-regressions (first line : the intercepts) then |
Z |
Binary squared adjacency matrix of size |
X_test |
validation sample generated the same way as |
Y_test |
Response vector associated to the validation sample |
sigma_X |
Vector of the standard deviations of the residuals of the sub-regressions (one value for each sub-regression) |
sigma_Y |
Standard deviation of the residual of the regression that generates |
nbcomp |
vector of the number of components for covariates that are not explained by others. |
1 2 3 4 5 6 7 8 9 10 11 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.