View source: R/stat_sim_dataset.r
simulate_dataset | R Documentation |
Generate an artificial dataset with correlated variables and defined means and standard deviations.
simulate_dataset( n = 5000, subsets = 4, random_seed = NULL, simbase = WoodSimulatR::ws_t_logf, loadtype = NULL, ..., RNGversion = "3.6.0" )
n |
Number of rows in the dataset |
subsets |
Either |
random_seed |
Allows to set an integer seed value for the random number
generator to achieve reproducible results
(see also |
simbase |
An object of class |
loadtype |
For passing on to |
... |
arguments passed on to |
RNGversion |
In |
In the package WoodSimulatR, a number of predefined base values for simulation
are stored – see simbase
.
Using a character vector for the argument subsets
leads to subsets
as equal in size as possible.
The argument subsets
enables differing means and standard deviations
for different subsamples. There are several possible usages:
If subsets = NULL
, the information about means and standard
deviations is taken from the simbase
. There can still be different
means and standard deviations if simbase
is an object of class
simbase_list
.
If a numeric vector or a character vector, it is used as argument
country
in an internal call to get_subsample_definitions
.
If a dataset, there are the following requirements:
identifier columns: The dataset has to have one or more
discrete-valued identifier columns (usually character vectors or
factors) which uniquely identify each row.
These identifier columns are named "country"
and
"subsample"
in the standard case as yielded by
get_subsample_definitions
.
In the general case, the identifier columns are detected as those
columns which are not named share, species, loadtype
or
literature
and which do not end in _mean
or _sd
.
If the argument simbase
is of class simbase_list
,
further restrictions apply (see below).
means and standard deviations: For at least one of the
variables defined in the simbase
, also the mean and the
standard deviation need to be given in each row; the column names for
this data must be the name of the respective variable(s)
from the simbase
, suffixed by _mean
and _sd
,
respectively.
optional: A column share
can be used to create
subsamples of different sizes proportional to the values in
share
.
The argument simbase
can be either an object of class
simbase_covar
or of class simbase_list
.
various predefined simbase_covar
objects are available
in WoodSimulatR
– see simbase
.
for objects of class simbase_list
, additional
restrictions apply:
the object may only have grouping variable(s) which are also
identifier columns according to the subsets
definition
above – if the subsets
argument is not a data frame,
the identifier columns are "country" and "subsample".
The value combinations in the identifier columns have to
match those which the subsets
argument leads to
(see also get_subsample_definitions
).
Both the means and standard deviations in the subsample definitions
(see get_subsample_definitions
) as well as the values in the
simbase
depend on the way the destructive testing of the sawn timber was
done. If the simbase
has a field loadtype
(see also simbase_covar
), this value is used in the call to
get_subsample_definitions
. Otherwise, the loadtype
has to be
passed directly to the present function unless no call to
get_subsample_definitions
is necessary (this depends on the
value of subsets
– see above). If a loadtype has been defined, a variable
loadtype
is also created in the resulting dataset for reference.
Negative values in any numeric column of the result dataset are forced to zero.
If random_seed
is not NULL
, reproducibility of results
is enforced by using set.seed
with arguments
kind='Mersenne-Twister'
and normal.kind='Inversion'
,
and by calling RNGversion
with argument RNGversion
.
If random_seed
is not NULL
, the random number generator
is reset at the end of the function using set.seed(NULL)
and
RNGversion(toString(getRversion()))
.
simulate_dataset(n = 10, subsets = 1, random_seed = 1) # As the loadtype is defined in the simbase, the argument loadtype is ignored # with a warning simulate_dataset(n = 10, subsets = 1, random_seed = 1, loadtype = 'be') # Two subsamples simulate_dataset(n = 10, subsets = 2, random_seed = 1) # Two subsamples from pre-defined countries simulate_dataset(n = 10, subsets = c('at', 'de'), random_seed = 1) # Two subsamples from pre-defined countries with different sample sizes simulate_dataset(n = 10, subsets = c(at = 3, de = 2), random_seed = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.