Subsampling datasource procedure
datasource.subsample picks randomly the specified amount of
samples from the original datasource and also adds noise to the
subsampled dataset if it is specified.
1 2 3
data.frame where columns contain variables and rows contain experiments.
Integer specifying the number of experiments that for performing the subsampling of datasources (default: NA).
Integer specifying the number of datasets to be generated for each of the selected original datasources (default: 5).
Integer specifying the desired percentage of local noise to be added at each of the subsampled datasets (default: 20).
Integer specifying the desired percentage of global noise to be added at each of the subsampled datasets (default: 0).
Character specifying the type of the noise to be added: "normal" (default: "normal").
Logical specifying if the datasets should have variability in the number of experiments between them (default: TRUE).
A single value, interpreted as an integer to specify seeds,
useful for creating simulations that can be reproduced
If the argument
experiments is NA, the value
experiments will be calculated automatically in order to
datasets.num smaller datasets that does not have the
same experiment twice inside each dataset.
Each of the subsampled datasets
experiments would have a
number of experiments around
experiments \pm 20 \%
that would be chosen randomly among the original the original
number of experiments without replacement.
If the argument
experiments is a number, the number of
datasets.num is calculated automatically.
If the number of specified
experiments is greater or equal
than the original number of experiments, then only a replicate
will be generated and the subsampled dataset would have the same
dimensions as the original one but the experiments will be
Two different types of noises could be added, that are specified
with the argument
"local": the variance of the noise is different for each variable and it is the percentage specified of the variance of each variable ( \pm 20 \% ).
"Globlal": the variance of the noise is the same for the whole datasource, it is the percentage specified of the mean variance of all the variables ( \pm 20 \% ).
datasource.subsample returns a list with
elements, each one of objects contains a data.frame of the
subsampled dataset with the amount of Gaussian noise
specified that would contain the same number of variables.
1 2 3 4 5 6 7