machu.top.env: Select top environmental variables
In wxguillo/machuruku: Ancestral niche reconstruction

View source: R/machuruku_code.R

machu.top.env

R Documentation

Select top environmental variables

Description

Perform a boosted regression tree analysis to identify the most important climate variables for your taxon set.

Usage

machu.top.env(
  occ,
  clim,
  sp.col = 1,
  col.xy = 2:3,
  learning.rt = 0.01,
  steps = 50,
  method = "contrib",
  nvars.save = 5,
  contrib.greater = 5,
  pa.ratio = 4,
  verbose = F
)

Arguments

`occ`	Occurrence data for all taxa. Identical to input for machu.1.tip.resp(). Dataframe, with columns in the order of species, x/long, y/lat.
`clim`	Climate data for all taxa. Identical to input for machu.1.tip.resp(). A RasterStack of corresponding climate variables. SpatRaster (from Terra) is also acceptable
`sp.col`	Specify which column of the input occurrence data corresponds to species ID. Default = 1.
`col.xy`	vector specifying long (x) and lat (y) of occurrence data. Default = 2:3.
`learning.rt`	Value from 0.001 to 0.01 for building the ENMs, start with 0.01 and if prompted, change to 0.001. Default = 0.01.
`steps`	Numbers of trees to add at each cycle for modelling each taxon. Start with 50 and if you run into problems gradually decrease, stopping at 1. Default = 50.
`method`	This determines how important environmental variables are selected.There are three options: "estimate", "contrib", "nvars". If method="estimate", the boosted regression tree algorithm will choose the number of variables to include by systematically removing variables until average change in the model exceeds the original standard error of deviance explained. This is the most computationally intensive method. If method="contrib", variables above a relative influence value will be kept. See associated parameter 'contrib.greater'. If method="nvars", a fixed number of user specified variables will be kept. See associated parameter 'nvars.save'. The kept variables are selected by their relative influence. The 'nvars.save'-highest contributing variables for each taxon are retained and pooled, then ranked, and the 'nvars.save'-highest contributing variables for the whole pool are finally retained.
`nvars.save`	If method="nvars",this variable is required. It is the number of the top variables to save. The kept variables are selected by their relative influence in predicting the species distribution, selecting for the highest contributing variables. Often the total variables retained is lower due to identical variables select among both species. The default value is 5. This value will be ignored if method="estimate" or "contrib".
`contrib.greater`	If method="contrib", this variable is required. The kept variables are selected for their relative influence in predicting the species' distribution. Here, users select variables equal to or above an input model contribution value. The default value for this method is 5 (= variables with 5 percent or higher contribution to model of either species are kept). This value will be ignored if method="estimate" or "nvars".
`pa.ratio`	Ratio of pseudoabsences to occurrence points, typically this is 4. The default value is 4. There have to be at least 50 total points (occ+pseudoabsences) for the model to work; if the sum does not total to 50, the difference is taken as the number of pseudoabsences, rather than the value of occ*pa.ratio.
`verbose`	Tf TRUE, print progress to the screen. Default = F.

Details

This function is a modified version of humboldt.top.env() from the package Humboldt. It runs generalized boosted regression models (a machine learning ENM algorithm) to select top parameters for inclusion your analyses. This is important because you want the models to reflect variables that are relevant to the species' distribution. Alternatively, you can run Maxent outside of R and manually curate the variables you include (also recommended).

Value

Prints the important climate variables to the screen. You can then combine them into a new RasterStack or SpatRaster object.

Examples

## acceptable 'clim' formats
# Single RasterStack (raster) (preferred)
clim <- stack(list.files(rasterfolder, pattern="T0_", full.names=T))
# Single SpatRaster (terra)
clim <- c(rast(list.files(rasterfolder, pattern="T0_", full.names=T)))

# identify the top 6 climate variables across all taxa
machu.top.env(occ, clim, method = "nvars", nvars.save = 6)
# identify all climate variables with a contribution greater than 10%
machu.top.env(occ, clim, method = "contrib", contrib.greater = 10)

wxguillo/machuruku documentation built on Jan. 23, 2025, 3:25 p.m.