sensitivity_analysis: Perform sensitivity analysis on ecometric models...

View source: R/sensitivity_analysis.R

sensitivity_analysisR Documentation

Perform sensitivity analysis on ecometric models (quantitative environmental variables)

Description

Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:

  • Sensitivity (internal consistency): How accurately the model predicts environmental conditions on the same data it was trained on.

  • Transferability (external applicability): How well the model performs on unseen data.

It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.

Usage

sensitivity_analysis(
  points_df,
  env_var,
  sample_sizes,
  iterations = 20,
  test_split = 0.2,
  grid_bins_1 = NULL,
  grid_bins_2 = NULL,
  transform_fun = NULL,
  parallel = TRUE,
  n_cores = parallel::detectCores() - 1
)

Arguments

points_df

Output first element of the list from summarize_traits_by_point(). A data frame with columns: summ_trait_1, summ_trait_2, count_trait, and the environmental variable.

env_var

Name of the environmental variable column in points_df (e.g., "precip").

sample_sizes

Numeric vector specifying the number of communities (sampling points) to evaluate in the sensitivity analysis. For each value, a random subset of the data of that size is drawn without replacement and then split into training and testing sets using the proportion defined by test_split (default is 80% training, 20% testing). All values in sample_sizes must be less than or equal to the number of rows in points_df, and large enough to allow splitting based on test_split (i.e., both the training and testing sets must contain at 30 communities).

iterations

Number of bootstrap iterations per sample size (default: 20).

test_split

Proportion of data to use for testing (default: 0.2).

grid_bins_1

Number of bins for the first trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

grid_bins_2

Number of bins for the second trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

transform_fun

Function to transform the environmental variable (default: NULL = no transformation).

parallel

Logical; whether to use parallel processing (default: TRUE).

n_cores

Number of cores to use for parallel processing (default: parallel::detectCores() - 1).

Details

Four base R plots are generated to visualize model performance as a function of sample size:

  1. Training correlation vs. Sample size: Shows how well the model fits training data.

  2. Testing correlation vs. Sample size: Shows generalizability to new data.

  3. Training mean anomaly vs. Sample size: Shows average prediction error on training data.

  4. Testing mean anomaly vs. Sample size: Shows average prediction error on test data.

Parallel processing is supported to speed up the analysis.

Value

A list containing:

combined_results

A data frame with mean absolute anomalies and correlations for each sample size and iteration.

summary_results

A data frame summarizing the mean anomalies and correlations across sample sizes.

Examples


# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")

# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
  points_df = geoPoints,
  trait_df = traits,
  species_polygons = spRanges,
  trait_column = "RBL",
  species_name_col = "sci_name",
  continent = FALSE,
  parallel = FALSE
)

# Run sensitivity analysis using annual precipitation
sensitivityResults <- sensitivity_analysis(
  points_df = traitsByPoint$points,
  env_var = "precip",
  sample_sizes = seq(40, 90, 10),
  iterations = 5,
  transform_fun = function(x) log(x + 1),
  parallel = FALSE  # Set to TRUE for faster performance on multicore machines
)

# View results
head(sensitivityResults$summary_results)


commecometrics documentation built on Aug. 8, 2025, 6:10 p.m.