shap_spatial_response: Calculate shapley values-based spatial response.

View source: R/shap_spatial_response.R

shap_spatial_responseR Documentation

Calculate shapley values-based spatial response.

Description

Calculate spatially SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.

Usage

shap_spatial_response(
  model,
  var_occ,
  variables,
  target_vars = NULL,
  shap_nsim = 10,
  seed = 10,
  pfun = .pfun_shap
)

Arguments

model

(isolation_forest or other model). It could be the item model of POIsotree made by function isotree_po. It also could be other user-fitted models as long as the pfun can work on it.

var_occ

(data.frame, tibble) The data.frame style table that include values of environmental variables at occurrence locations.

variables

(stars) The stars of environmental variables. It should have multiple attributes instead of dims. If you have raster object instead, you could use st_as_stars to convert it to stars or use read_stars directly read source data as a stars. You also could use item variables of POIsotree made by function isotree_po.

target_vars

(a vector of character) The selected variables to process. If it is NULL, all variables will be used.

shap_nsim

(integer) The number of Monte Carlo repetitions in SHAP method to use for estimating each Shapley value. See details in documentation of function explain in package fastshap. When the number of variables is large, a smaller shap_nsim could be used. Be cautious that making SHAP-based spatial dependence will be slow because of Monte-Carlo computation for all pixels. But it is worth the time because it is much more informative. See details in documentation of function explain in package fastshap. The default is 10. Usually a value 10 - 20 is enough.

seed

(integer) The seed for any random progress. The default is 10L.

pfun

(function) The predict function that requires two arguments, object and newdata. It is only required when model is not isolation_forest. The default is the wrapper function designed for iForest model in itsdm.

Details

The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response.

Value

(SHAPSpatial) A list of

A list of stars object of spatially SHAP-based response of all variables

See Also

spatial_response

Examples

# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"

# Format the observations
obs_train_eval <- format_observation(
  obs_df = obs_df, eval_df = eval_df,
  x_col = x_col, y_col = y_col, obs_col = obs_col,
  obs_type = "presence_only")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12))

# With imperfect_presence mode,
mod <- isotree_po(
  obs_mode = "imperfect_presence",
  obs = obs_train_eval$obs,
  obs_ind_eval = obs_train_eval$eval,
  variables = env_vars, ntrees = 10,
  sample_size = 0.8, ndim = 2L,
  seed = 123L, nthreads = 1,
  response = FALSE,
  spatial_response = FALSE,
  check_variable = FALSE)

shap_spatial <- shap_spatial_response(
  model = mod$model,
  var_occ = mod$vars_train,
  variables = mod$variables,
  shap_nsim = 1)

shap_spatial <- shap_spatial_response(
 model = mod$model,
 target_vars = c("bio1", "bio12"),
 var_occ = mod$vars_train,
 variables = mod$variables,
 shap_nsim = 1)

## Not run: 
##### Use Random Forest model as an external model ########
library(randomForest)

# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
  filter(usage == "train")

env_vars <- system.file(
  'extdata/bioclim_tanzania_10min.tif',
  package = 'itsdm') %>% read_stars() %>%
  slice('band', c(1, 5, 12)) %>%
  split()

model_data <- stars::st_extract(
  env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
  as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
  mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)

mod_rf <- randomForest(
  occ ~ .,
  data = model_data,
  ntree = 200)

pfun <- function(X.model, newdata) {
  # for data.frame
  predict(X.model, newdata, type = "prob")[, "1"]
}

shap_spatial <- shap_spatial_response(
  model = mod_rf,
  target_vars = c("bio1", "bio12"),
  var_occ = model_data %>% select(-occ),
  variables = env_vars,
  shap_nsim = 10,
  pfun = pfun)

## End(Not run)


itsdm documentation built on July 9, 2023, 6:45 p.m.