sdm: Fit and evaluate species distribution models

sdmR Documentation

Fit and evaluate species distribution models

Description

Fits sdm for single or multiple species using single or multiple methods specified by a user in methods argument, and evaluates their performance.

Usage

sdm(formula, data, methods,...)

Arguments

formula

Specifies the structure of the model, types of features, etc.

data

a sdmdata object created using sdmData function

methods

Character. Specifies the methods, used to fit the models

...

additional arguments

Details

sdm fits multiple models and can be used to generate multiple runs (replicates) of each method through partitioning (using one or several partitioning methods including: subsampling, cross-validation, and bootstrapping.

Each model is evaluated against training data, and if available, splitted data (through partitioning; called dependent test data as well, i.e., "dep.test") and/or indipendent test data ("indep.test").

User should make sure that the methods are available and the required packages for them are installed before putting their names in the function, otherwise, the methods that cannot be run for any reason, are excluded by the function. It is a good practice to call installAll function (just one time when the sdm is installed), that tries to install all the packages that may be needed somewhere in the sdm package.

A new method can be adopted and added to the package by a user using add function. It is also possible to get an instance of an existing method, override the setting and definition, and then add it with a new name (e.g., my.glm).

The output would be a single object (sdmModels) that can be read/reproduced everywhere (e.g., on a new machine). A setting object can also be taken (exported) out of the output sdmModels object, that can be used to reproduce the same practice but given new conditions (i.e., new dataset, area. etc.)

For speed up the model fitting, you may use parallel processing (a high-performance computing solution) by providing a list of items can be passed to parallelSetting argument. The items in the list includes:

ncore: defines the number of cores (it can also be specified outside of this list

method: defines the parallelising engine. Currently, three options are available including 'parallel', 'foreach', and 'future'. default is 'parallel'

doParallel: Optional, definition to register for a backend for parallel processing (needed when method='foreach'). It should be provided as an R expression like the following example:

expression(registerDoParallel(parallelSetting@cl))

The above example is based on the function available in doParallel package. Other packages can also be used to provide and register backend technologies (e.g., doMC)

cluster: Optional; in case a cluster is created and available (e.g., using cl <- parallel::makeCluster(2)), the cluster object can be introduced here to be used as the parallel processing engine, otherwise, it is handled by the sdm package.

hosts: Optional; To use remote machines/clusters in the parallel processing, a character vector with the addresses (names or IPs) of the accessible (on the network) remote clusters can be provided here to be registered and used in parallel processing (still under development so it may not work appropriately!)

fork: Logical, Available for non-windows operating system and specifies whether a fork solution should be used for the parallelisation. Default is TRUE for non-windows OS and FALSE for windows.

NOTE: Only use parallelSetting when you deal with a big dataset or large number of models otherwise, it make the procedure slower rather than faster if the procedure is quick without parallelising!

Value

an object of class sdmModels

Author(s)

Babak Naimi naimi.b@gmail.com

https://www.r-gis.net/

https://www.biogeoinformatics.org/

References

Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, 39:368-375, DOI: 10.1111/ecog.01881

Examples

## Not run: 
file <- system.file("external/pa_df.csv", package="sdm")

df <- read.csv(file)

head(df) 

d <- sdmData(sp~b15+NDVI,train=df)

d
#----
# Example 1: fit using 3 models, and no evaluation (evaluation based on training dataset):

m <- sdm(sp~b15+NDVI,data=d,methods=c('glm','gam','gbm'))

m

# Example 3: fit using 5 models, and 
# evaluates using 10 runs of subsampling replications taking 30 percent as test:

m <- sdm(sp~b15+NDVI,data=d,methods=c('glm','gam','gbm','svm','rf'),
          replication='sub',test.percent=30,n=10)

m


# Example 3: fits using 5 models, and 
# evaluates using 10 runs of both 5-folds cross-validation and bootsrapping replication methods

m <- sdm(sp~.,data=d,methods=c('gbm','tree','mars','mda','fda'),
          replication=c('cv','boot'),cv.folds=5,n=10)

m

# Example 4: fit using 3 models; evaluate the models using subsampling, 
# and override the default settings for the method brt:

m <- sdm(sp~b15+NDVI,data=d,methods=c('glm','gam','brt'),test.p=30,
          modelSettings=list(brt=list(n.trees=500,train.fraction=0.8)))

m


## End(Not run)


babaknaimi/sdm documentation built on May 6, 2024, 1:52 a.m.