create_resamples: Create samples for fitting, calibrating, and validating...
In NINAnor/oneimpact: Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

create_resamples

R Documentation

Create samples for fitting, calibrating, and validating models

Description

The function creates (sub)samples of the data to be included for three different model blocks: train (fitting), test (calibrate, tunning), and validate. By default, samples are created through bootstrapping, i.e. with replacement. This means data observations can be repeated within a given sample block, but observations included in one block are necessarily excluded from the other blocks (e.g. observations selected for validation will be absent from train and test blocks). ' Samples can be created at random (if spat_strat = NULL, default) or with spatial stratification (spatial strata can be created with the function spat_strat(). In the latter case, train and test sets are spatially split, to allow for a more thorough cross-validation to define the penalty parameter in the penalized regressions. Also, samples might include a specific variable (with classes or groups) H0 to be used for (block cross-)validation (if colH0 is provided), but this is not a requirement.

Usage

create_resamples(
  y,
  times = 10,
  p = c(0.4, 0.2, 0.2),
  max_size_blockH0_validation = 1000,
  max_size_blockH0_train = 1000,
  max_size_blockH0_test = 1000,
  max_number_blocksH1_train = 40,
  sp_strat = NULL,
  colH0 = NULL,
  H0setup = c("LAO", "LOO")[1],
  replace = TRUE
)

Arguments

`y`	`⁠[vector]⁠` A vector of outcomes. It can be the response variable for the data set of interest, or only the `case = 1` cases for conditional logistic ( step-selection) analyses.
`times`	`⁠[numeric(1)=10]⁠` The number of partitions or samples to be sampled.
`p`	`⁠[numeric(3)=c(0.4,0.2,0.2)]⁠` A 3 element numeric vector with the percentage of data that goes to fitting/training (H1), testing (H2), and validation (H0). Values should be between 0 and 1 and should not sum more than 1.
`max_size_blockH0_validation`	`⁠[numeric(1)=1000]⁠` Maximum size of the blocks H0 (e.g. population, area, year) for validation block H0. Used to limit the number of observations in the validation set, to avoid sampling too many observations of the block H0 levels with more observations, for imbalanced data sets. To find out about meaningful values for this parameter, use `explore_blocks_pre()` and `explore_blocks()`.
`max_size_blockH0_train`	`⁠[numeric(1)=1000]⁠` Maximum size of the blocks H0 (e.g. population, area, year) for training/fitting the model. Used to limit the number of observations in the train set, to avoid sampling too many observations of the block H0 levels with more observations, for imbalanced data sets. To find out about meaningful values for this parameter, use `explore_blocks()`.
`max_size_blockH0_test`	`⁠[numeric(1)=1000]⁠` Not implemented yet.
`max_number_blocksH1_train`	`⁠[numeric(1)=15]⁠` Maximum number of levels or blocks H1 to be used for model fitting/training. This is only meaningful if there is spatial stratification (i.e. if `sp_strat` in not `NULL`). To find out about meaningful values for this parameter, use `explore_blocks()`.
`sp_strat`	`⁠[data.frame]⁠` Default is `NULL`. If not `NULL`, the `data.frame` resulting from `spat_strat()` should be provided here.
`colH0`	`⁠[numeric,character,vector]⁠` Column number or name to define the IDs of the H0 level - the one with ecological meaning, e.g. individual, population, or study area, used for validating the predictions of the fitted model. If `sp_strat` is provided, `colH0` is a string with the column name (or the column number) in the `sp_strat` table. If `sp_strat = NULL`, `colH0` is a vector of H0 values with the same length as `y`. If `colH0 = NULL` (Default), no H0 level is defined and there is no block cross-validation in the bootstraped sets.
`H0setup`	Not implemented yet.
`replace`	`⁠[logical(1)=TRUE]⁠` Whether to perform the bootstrap sampling with or without replacement (Default is `TRUE`).

Value

A list with lists for the sets for train, test, and validation, each of which with the indices corresponding to the observations to be kept in each resample. If colkH0 is not NULL, a vector with the blockH0 which each observation pertains to is also appended to the output. If spat_strat is provided, a list of blocks H0 and possibly a list of strata might also be provided.

Examples

# random sampling, no validation block H0
y <- runif(200)
samples <- create_resamples(y, p = c(0.4, 0.2, 0.2), times = 5)
samples

# with validation block H0
data(reindeer)
library(terra)
library(amt)

# random sampling, with validation block H0
samples <- create_resamples(1:nrow(reindeer), times = 5,
                            p = c(0.2, 0.2, 0.2),
                            max_size_blockH0_validation = 1000,
                            colH0 = reindeer$original_animal_id)
samples

# spatially stratified sampling, with validation block H0
spst <- spat_strat(reindeer, coords = c("x", "y"), colH0 = "original_animal_id",
                   all_cols = F)
samples <- create_resamples(1:nrow(reindeer), times = 5,
                            p = c(0.2, 0.2, 0.2),
                            max_number_blocksH1_train = 20,
                            sp_strat = spst,
                            colH0 = "blockH0")
samples
sum(is.na(samples$test[[1]]))
sapply(samples$train, function(x) sum(is.na(x)))
sapply(samples$test, function(x) sum(is.na(x)))

# small number of blocks or too high p[1] might incur in errors
samples <- create_resamples(1:nrow(reindeer), times = 10,
                            max_number_blocksH1_train = 3,
                            sp_strat = spst,
                            colH0 = "blockH0")

NINAnor/oneimpact documentation built on June 14, 2025, 12:27 a.m.

NINAnor/oneimpact index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NINAnor/oneimpact
Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

create_resamples: Create samples for fitting, calibrating, and validating...
In NINAnor/oneimpact: Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

Create samples for fitting, calibrating, and validating models

Description

Usage

Arguments

Value

Examples

Related to create_resamples in NINAnor/oneimpact...

R Package Documentation

Browse R Packages

We want your feedback!

NINAnor/oneimpact Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

create_resamples: Create samples for fitting, calibrating, and validating... In NINAnor/oneimpact: Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

Create samples for fitting, calibrating, and validating models

Description

Usage

Arguments

Value

Examples

Related to create_resamples in NINAnor/oneimpact...

R Package Documentation

Browse R Packages

We want your feedback!

NINAnor/oneimpact
Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies

create_resamples: Create samples for fitting, calibrating, and validating...
In NINAnor/oneimpact: Tools for the assessment of the cumulative impacts of anthropogenic features in ecological studies