pca_multithreshold: Compute Principal Component Analysis at multiple distance...
In spatialRF: Easy Spatial Modeling with Random Forest

pca_multithreshold

R Documentation

Compute Principal Component Analysis at multiple distance thresholds

Description

Computes principal components of a distance matrix at multiple distance thresholds to generate multi-scale spatial predictors for rf_spatial(). Each distance threshold defines a different neighborhood scale, and PCA is applied to the weighted distance matrix at each scale.

Usage

pca_multithreshold(
  distance.matrix = NULL,
  distance.thresholds = NULL,
  max.spatial.predictors = NULL
)

Arguments

`distance.matrix`	Numeric distance matrix between observations.
`distance.thresholds`	Numeric vector of distance thresholds defining different neighborhood scales. Each threshold specifies the maximum distance for spatial neighbors at that scale. If `NULL`, automatically computed with `default_distance_thresholds()`. Default: `NULL`.
`max.spatial.predictors`	Integer specifying the maximum number of spatial predictors to retain. If the total number of generated predictors exceeds this value, only the first `max.spatial.predictors` are kept (ordered by variance explained). Useful for managing memory when `distance.matrix` is very large. Default: `NULL` (keeps all predictors).

Details

This function generates multi-scale spatial predictors by applying PCA to distance matrices at different neighborhood scales. The process for each distance threshold:

Converts the distance matrix to weights using weights_from_distance_matrix(), where distances above the threshold are set to zero
Applies pca() to the weighted distance matrix to extract principal components
Names the resulting predictors with the distance threshold for identification
Filters out predictors with all near-zero values

Multi-scale spatial modeling:

Different distance thresholds capture spatial patterns at different scales. Combining predictors from multiple thresholds allows rf_spatial() to account for spatial autocorrelation operating at multiple spatial scales simultaneously. This is analogous to mem_multithreshold() but uses PCA instead of Moran's Eigenvector Maps.

Comparison with MEMs:

Both pca_multithreshold() and mem_multithreshold() generate spatial predictors from distance matrices, but differ in their approach:

PCA: Captures the main patterns of variation in the weighted distance matrix without considering spatial autocorrelation structure
MEMs: Explicitly extracts spatial patterns with specific autocorrelation scales (positive and negative eigenvalues)

In practice, MEMs are generally preferred for spatial modeling because they explicitly target spatial autocorrelation patterns, but PCA can serve as a simpler alternative or for comparison.

Value

Data frame where each column is a spatial predictor derived from PCA at a specific distance threshold. Columns are named with the pattern ⁠spatial_predictor_<distance>_<number>⁠ (e.g., "spatial_predictor_1000_1", "spatial_predictor_5000_2"), where ⁠<distance>⁠ is the distance threshold and ⁠<number>⁠ is the principal component rank. The number of rows matches the number of observations in distance.matrix.

Examples


data(plants_distance)

# Compute PCA spatial predictors at multiple distance thresholds
pca_predictors <- pca_multithreshold(
  distance.matrix = plants_distance,
  distance.thresholds = c(0, 1000, 5000)
)

# View structure
head(pca_predictors)
dim(pca_predictors)

# Check predictor names (show scale information)
colnames(pca_predictors)[1:6]

# Limit number of predictors to save memory
pca_limited <- pca_multithreshold(
  distance.matrix = plants_distance,
  distance.thresholds = c(0, 1000, 5000),
  max.spatial.predictors = 20
)
ncol(pca_limited)  # At most 20 predictors

spatialRF documentation built on Dec. 20, 2025, 1:07 a.m.