View source: R/simulate_data.R
simulate_data | R Documentation |
Generate simulated dataset based on transformation of an underlying base distribution.
simulate_data(generator, ...)
## Default S3 method:
simulate_data(
generator = function(n) matrix(rnorm(n)),
n_obs = 1,
transform_initial = base::identity,
names_final = NULL,
prefix_final = NULL,
process_final = list(),
seed = NULL,
...
)
## S3 method for class 'simdesign'
simulate_data(
generator,
n_obs = 1,
seed = NULL,
apply_transformation = TRUE,
apply_processing = TRUE,
...
)
generator |
Function which generates data from the underlying base distribution. It is
assumed it takes the number of simulated observations |
... |
Further arguments passed to |
n_obs |
Number of simulated observations. |
transform_initial |
Function which specifies the transformation of the underlying
dataset |
names_final |
NULL or character vector with variable names for final dataset |
prefix_final |
NULL or prefix attached to variables in final dataset |
process_final |
List of lists specifying post-processing functions applied to final
datamatrix |
seed |
Set random seed to ensure reproducibility of results. |
apply_transformation |
This argument can be set to FALSE to override the information stored in the
passed |
apply_processing |
This argument can be set to FALSE to override the information stored in the
passed |
Data is generated using the following procedure:
An underlying dataset Z
is sampled from some distribution. This is
done by a call to the generator
function.
Z
is then transformed into the final dataset X
by applying the
transform
function to Z
.
X
is post-processed if specified (e.g. truncation to avoid
outliers).
Data.frame or matrix with n_obs
rows for simulated dataset X
.
simulate_data(default)
: Function to be used if no simdesign
S3 class is used.
simulate_data(simdesign)
: Function to be used with simdesign
S3 class.
The generator
function which is either passed directly, or via a
simdata::simdesign
object, is assumed to provide the same interface
as the random generation functions in the R stats and extraDistr
packages. Specifically, that means it takes the number of observations as
first argument. All further arguments can be set via passing them as
named argument to this function. It is expected to return a two-dimensional
array (matrix or data.frame) for which the number of columns can be
determined. Otherwise the check_and_infer
step will fail.
Transformations should be applicable to the output of the generator
function (i.e. take a data.frame or matrix as input) and output another
data.frame or matrix. A convenience function function_list
is
provided by this package to specify transformations as a list of functions,
which take the whole datamatrix Z
as single argument and can be used to
apply specific transformations to the columns of that matrix. See the
documentation for function_list
for details.
Post-processing the datamatrix is based on do_processing
.
Variables are named by names_final
if not NULL and of correct length.
Otherwise, if prefix_final
is not NULL, it is used as prefix for variable
numbers. Otherwise, variables names remain as returned by the generator
function.
This function is best used in conjunction with the simdesign
S3 class or any template based upon it, which facilitates further data
visualization and conveniently stores information as a template for
simulation tasks.
simdesign
,
simdesign_mvtnorm
,
simulate_data_conditional
,
do_processing
generator <- function(n) mvtnorm::rmvnorm(n, mean = 0)
simulate_data(generator, 10, seed = 24)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.