create_resamples | R Documentation |
The function creates (sub)samples of the data to be included for three different
model blocks: train (fitting), test (calibrate, tunning), and validate.
By default, samples are created through bootstrapping, i.e. with replacement. This means
data observations can be repeated within a given sample block, but observations included
in one block are necessarily excluded from the other blocks (e.g. observations selected
for validation will be absent from train and test blocks).
'
Samples can be created at random (if spat_strat = NULL
, default) or with spatial
stratification (spatial strata can be created with the function spat_strat()
.
In the latter case, train and test sets are spatially split, to allow for a more
thorough cross-validation to define the penalty parameter in the penalized regressions.
Also, samples might include a specific variable (with classes or groups) H0 to be used for
(block cross-)validation (if colH0
is provided), but this is not a requirement.
create_resamples(
y,
times = 10,
p = c(0.4, 0.2, 0.2),
max_size_blockH0_validation = 1000,
max_size_blockH0_train = 1000,
max_size_blockH0_test = 1000,
max_number_blocksH1_train = 40,
sp_strat = NULL,
colH0 = NULL,
H0setup = c("LAO", "LOO")[1],
replace = TRUE
)
y |
|
times |
|
p |
|
max_size_blockH0_validation |
|
max_size_blockH0_train |
|
max_size_blockH0_test |
|
max_number_blocksH1_train |
|
sp_strat |
|
colH0 |
|
H0setup |
Not implemented yet. |
replace |
|
A list with lists for the sets for train, test, and validation, each of which
with the indices corresponding to the observations to be kept in each resample.
If colkH0
is not NULL
, a vector with the blockH0 which each observation pertains
to is also appended to the output.
If spat_strat
is provided, a list of blocks H0 and possibly a list of strata
might also be provided.
# random sampling, no validation block H0
y <- runif(200)
samples <- create_resamples(y, p = c(0.4, 0.2, 0.2), times = 5)
samples
# with validation block H0
data(reindeer)
library(terra)
library(amt)
# random sampling, with validation block H0
samples <- create_resamples(1:nrow(reindeer), times = 5,
p = c(0.2, 0.2, 0.2),
max_size_blockH0_validation = 1000,
colH0 = reindeer$original_animal_id)
samples
# spatially stratified sampling, with validation block H0
spst <- spat_strat(reindeer, coords = c("x", "y"), colH0 = "original_animal_id",
all_cols = F)
samples <- create_resamples(1:nrow(reindeer), times = 5,
p = c(0.2, 0.2, 0.2),
max_number_blocksH1_train = 20,
sp_strat = spst,
colH0 = "blockH0")
samples
sum(is.na(samples$test[[1]]))
sapply(samples$train, function(x) sum(is.na(x)))
sapply(samples$test, function(x) sum(is.na(x)))
# small number of blocks or too high p[1] might incur in errors
samples <- create_resamples(1:nrow(reindeer), times = 10,
max_number_blocksH1_train = 3,
sp_strat = spst,
colH0 = "blockH0")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.