sensitivity_analysis_qual: Perform sensitivity analysis on ecometric models (qualitative...

View source: R/sensitivity_analysis_qual.R

sensitivity_analysis_qualR Documentation

Perform sensitivity analysis on ecometric models (qualitative environmental variables)

Description

Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:

  • Sensitivity (internal consistency): How accurately the model predicts environmental conditions on the same data it was trained on.

  • Transferability (external applicability): How well the model performs on unseen data.

It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.

Usage

sensitivity_analysis_qual(
  points_df,
  category_col,
  sample_sizes,
  iterations = 20,
  test_split = 0.2,
  grid_bins_1 = NULL,
  grid_bins_2 = NULL,
  parallel = TRUE,
  n_cores = parallel::detectCores() - 1
)

Arguments

points_df

Output first element of the list from summarize_traits_by_point(). A data frame with columns: summ_trait_1, summ_trait_2, count_trait, and the environmental variable.

category_col

Name of the column containing the categorical trait.

sample_sizes

Numeric vector specifying the number of communities (sampling points) to evaluate in the sensitivity analysis. For each value, a random subset of the data of that size is drawn without replacement and then split into training and testing sets using the proportion defined by test_split (default is 80% training, 20% testing). All values in sample_sizes must be less than or equal to the number of rows in points_df, and large enough to allow splitting based on test_split (i.e., both the training and testing sets must contain at 30 communities).

iterations

Number of bootstrap iterations per sample size (default = 20).

test_split

Proportion of data to use for testing (default = 0.2).

grid_bins_1

Number of bins for the first trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

grid_bins_2

Number of bins for the second trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

parallel

Logical; whether to run iterations in parallel (default = TRUE).

n_cores

Number of cores for parallelization (default = detectCores() - 1).

Details

Two plots are generated:

  1. Training Accuracy vs. Sample size: Reflects internal model consistency.

  2. Testing Accuracy vs. Sample size: Reflects external model performance.

Parallel processing is supported to speed up the analysis.

Value

A list containing:

combined_results

All raw iteration results.

summary_results

Mean accuracy per sample size.

Examples


# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")

# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
  points_df = geoPoints,
  trait_df = traits,
  species_polygons = spRanges,
  trait_column = "RBL",
  species_name_col = "sci_name",
  continent = FALSE,
  parallel = FALSE
)

# Run sensitivity analysis for dominant land cover class
sensitivityQual <- sensitivity_analysis_qual(
  points_df = traitsByPoint$points,
  category_col = "vegetation",
  sample_sizes = seq(40, 90, 10),
  iterations = 5,
  parallel = FALSE
)

# View results
head(sensitivityQual$summary_results)


commecometrics documentation built on Aug. 8, 2025, 6:10 p.m.