View source: R/csem_resample.R
resampleData | R Documentation |
Resample data from a data set using common resampling methods.
For bootstrap or jackknife resampling, package users usually do not need to
call this function but directly use resamplecSEMResults()
instead.
resampleData( .object = NULL, .data = NULL, .resample_method = c("bootstrap", "jackknife", "permutation", "cross-validation"), .cv_folds = 10, .id = NULL, .R = 499, .seed = NULL )
.object |
An R object of class cSEMResults resulting from a call to |
.data |
A |
.resample_method |
Character string. The resampling method to use. One of: "bootstrap", "jackknife", "permutation", or "cross-validation". Defaults to "bootstrap". |
.cv_folds |
Integer. The number of cross-validation folds to use. Setting
|
.id |
Character string or integer. A character string giving the name or
an integer of the position of the column of |
.R |
Integer. The number of bootstrap runs, permutation runs
or cross-validation repetitions to use. Defaults to |
.seed |
Integer or |
The function resampleData()
is general purpose. It simply resamples data
from a data set according to the resampling method provided
via the .resample_method
argument and returns a list of resamples.
Currently, bootstrap
, jackknife
, permutation
, and cross-validation
(both leave-one-out (LOOCV) and k-fold cross-validation) are implemented.
The user may provide the data set to resample either explicitly via the .data
argument or implicitly by providing a cSEMResults objects to .object
in which case the original data used in the call that created the
cSEMResults object is used for resampling.
If both, a cSEMResults object and a data set via .data
are provided
the former is ignored.
As csem()
accepts a single data set, a list of data sets as well as data sets
that contain a column name used to split the data into groups,
the cSEMResults object may contain multiple data sets.
In this case, resampling is done by data set or group. Note that depending
on the number of data sets/groups provided this computation may be slower
as resampling will be repeated for each data set/group.
To split data provided via the .data
argument into groups, the column name or
the column index of the column containing the group levels to split the data
must be given to .id
. If data that contains grouping is taken from
a cSEMResults object, .id
is taken from the object information. Hence,
providing .id
is redundant in this case and therefore ignored.
The number of bootstrap or permutation runs as well as the number of
cross-validation repetitions is given by .R
. The default is
499
but should be increased in real applications. See e.g.,
\insertCiteHesterberg2015;textualcSEM, p.380 for recommendations concerning
the bootstrap. For jackknife .R
is ignored as it is based on the N leave-one-out data sets.
Choosing resample_method = "permutation"
for ungrouped data causes an error
as permutation will simply reorder the observations which is usually not
meaningful. If a list of data is provided
each list element is assumed to represent the observations belonging to one
group. In this case, data is pooled and group adherence permuted.
For cross-validation the number of folds (k
) defaults to 10
. It may be
changed via the .cv_folds
argument. Setting k = 2
(not 1!) splits
the data into a single training and test data set. Setting k = N
(where N
is the
number of observations) produces leave-one-out cross-validation samples.
Note: 1.) At least 2 folds required (k > 1
); 2.) k
can not be larger than N
;
3.) If N/k
is not not an integer the last fold will have less observations.
Random number generation (RNG) uses the L'Ecuyer-CRMR RGN stream as implemented in the future.apply package \insertCiteBengtsson2018acSEM. See ?future_lapply for details. By default a random seed is chosen.
The structure of the output depends on the type of input and the resampling method:
If a matrix
or data.frame
without grouping variable
is provided (i.e., .id = NULL
), the result is a list of length .R
(default 499
). Each element of that list is a bootstrap (re)sample.
If a grouping variable is specified or a list of data is provided
(where each list element is assumed to contain data for one group),
resampling is done by group. Hence,
the result is a list of length equal to the number of groups
with each list element containing .R
bootstrap samples based on the
N_g
observations of group g
.
If a matrix
or data.frame
without grouping variable
is provided (.id = NULL
), the result is a list of length equal to the number
of observations/rows (N
) of the data set provided.
Each element of that list is a jackknife (re)sample.
If a grouping variable is specified or a list of data is provided
(where each list element is assumed to contain data for one group),
resampling is done by group. Hence,
the result is a list of length equal to the number of group levels
with each list element containing N
jackknife samples based on the
N_g
observations of group g
.
If a matrix
or data.frame
without grouping variable
is provided an error is returned as permutation will simply reorder the observations.
If a grouping variable is specified or a list of data is provided
(where each list element is assumed to contain data of one group),
group membership is permuted. Hence, the result is a list of length .R
where each element of that list is a permutation (re)sample.
If a matrix
or data.frame
without grouping variable
is provided a list of length .R
is returned. Each list element
contains a list containing the k
splits/folds subsequently
used as test and training data sets.
If a grouping variable is specified or a list of data is provided
(where each list element is assumed to contain data for one group),
cross-validation is repeated .R
times for each group. Hence,
the result is a list of length equal to the number of groups,
each containing .R
list elements (the repetitions) which in turn contain
the k
splits/folds.
csem()
, cSEMResults, resamplecSEMResults()
# =========================================================================== # Using the raw data # =========================================================================== ### Bootstrap (default) ----------------------------------------------------- res_boot1 <- resampleData(.data = satisfaction) str(res_boot1, max.level = 3, list.len = 3) ## To replicate a bootstrap draw use .seed: res_boot1a <- resampleData(.data = satisfaction, .seed = 2364) res_boot1b <- resampleData(.data = satisfaction, .seed = 2364) identical(res_boot1, res_boot1a) # TRUE ### Jackknife --------------------------------------------------------------- res_jack <- resampleData(.data = satisfaction, .resample_method = "jackknife") str(res_jack, max.level = 3, list.len = 3) ### Cross-validation -------------------------------------------------------- ## Create dataset for illustration: dat <- data.frame( "x1" = rnorm(100), "x2" = rnorm(100), "group" = sample(c("male", "female"), size = 100, replace = TRUE), stringsAsFactors = FALSE) ## 10-fold cross-validation (repeated 100 times) cv_10a <- resampleData(.data = dat, .resample_method = "cross-validation", .R = 100) str(cv_10a, max.level = 3, list.len = 3) # Cross-validation can be done by group if a group identifyer is provided: cv_10 <- resampleData(.data = dat, .resample_method = "cross-validation", .id = "group", .R = 100) ## Leave-one-out-cross-validation (repeated 50 times) cv_loocv <- resampleData(.data = dat[, -3], .resample_method = "cross-validation", .cv_folds = nrow(dat), .R = 50) str(cv_loocv, max.level = 2, list.len = 3) ### Permuation --------------------------------------------------------------- res_perm <- resampleData(.data = dat, .resample_method = "permutation", .id = "group") str(res_perm, max.level = 2, list.len = 3) # Forgetting to set .id causes an error ## Not run: res_perm <- resampleData(.data = dat, .resample_method = "permutation") ## End(Not run) # =========================================================================== # Using a cSEMResults object # =========================================================================== model <- " # Structural model QUAL ~ EXPE EXPE ~ IMAG SAT ~ IMAG + EXPE + QUAL + VAL LOY ~ IMAG + SAT VAL ~ EXPE + QUAL # Measurement model EXPE =~ expe1 + expe2 + expe3 + expe4 + expe5 IMAG =~ imag1 + imag2 + imag3 + imag4 + imag5 LOY =~ loy1 + loy2 + loy3 + loy4 QUAL =~ qual1 + qual2 + qual3 + qual4 + qual5 SAT =~ sat1 + sat2 + sat3 + sat4 VAL =~ val1 + val2 + val3 + val4 " a <- csem(satisfaction, model) # Create bootstrap and jackknife samples res_boot <- resampleData(a, .resample_method = "bootstrap", .R = 499) res_jack <- resampleData(a, .resample_method = "jackknife") # Since `satisfaction` is the dataset used the following approaches yield # identical results. res_boot_data <- resampleData(.data = satisfaction, .seed = 2364) res_boot_object <- resampleData(a, .seed = 2364) identical(res_boot_data, res_boot_object) # TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.