sdmSetting: creating sdmSetting object

sdmSettingR Documentation

creating sdmSetting object

Description

Creates sdmSetting object that holds settings to fit and evaluate the models. It can be used to reproduce a study.

Usage

sdmSetting(formula,data,methods,interaction.depth=1,n=1,replication=NULL,cv.folds=NULL,
     test.percent=NULL,bg=NULL,bg.n=NULL,var.importance=NULL,response.curve=TRUE,
     var.selection=FALSE,modelSettings=NULL,seed=NULL,parallelSetting=NULL,...)


Arguments

formula

specify the structure of the model

data

sdm data object or data.frame including species and feature data

methods

character, name of the algorithms

interaction.depth

level of interactions between predictors

n

number of replicates (run)

replication

replication method (e.g., 'subsampling', 'bootstrapping', 'cv')

cv.folds

number of folds if cv (cross-validation) is in the selected replication methods

test.percent

test percentage if subsampling is in the selected replication methods

bg

method to generate background

bg.n

number of background records

var.importance

logical, whether variable importance should be calculated

response.curve

method to calculate variable importance

var.selection

logical, whether variable selection should be considered

modelSettings

optional list; settings for modelling methods can be specified by users

seed

default is NULL; either logical specify whether a seed for random number generator should be considered, or a numerical to specify the exact seed number

parallelSetting

default is NULL; a list include setting items for parallel processing. The items in parallel setting include: ncore, method, type, hosts, doParallel, and fork; see details for more information.

...

additional arguments

Details

using sdmSetting, the feature types, interaction.depth and all settings of the model can be defined. This function generate a sdmSetting object that can be specifically helpful for reproducibility. The object can be shared by a user that may be used for other studies.

If a user aims to reproduce the same results for every time the code is running with the same data and settings, a seed number should be specified. Through the seed argument, a user can specify NULL, means a seed should not be set (if a random sampling is incorporated in the modelling procedure, for different runs the results would be different); TRUE, means a seed should be set (the seed number is randomly selected and used everytime the same setting is incorporated); a number, means the seed will be set to the number specified by the user.

For parallel processing, a list of items can be passed to parallelSetting, including:

ncore: defines the number of cores (it can also be specified outside of this list

method: defines the parallelising engine. Currently, three options are available including 'parallel', 'foreach', and 'future'. default is 'parallel'

doParallel: Optional, definition to register for a backend for parallel processing (needed when method='foreach'). It should be provided as an R expression like the following example:

expression(registerDoParallel(parallelSetting@cl))

The above example is based on the function available in doParallel package. Other packages can also be used to provide and register backend technologies (e.g., doMC)

cluster: Optional; in case a cluster is created and available (e.g., using cl <- parallel::makeCluster(2)), the cluster object can be introduced here to be used as the parallel processing engine, otherwise, it is handled by the sdm package.

hosts: Optional; To use remote machines/clusters in the parallel processing, a character vector with the addresses (names or IPs) of the accessible (on the network) remote clusters can be provided here to be registered and used in parallel processing (still under development so it may not work appropriately!)

fork: Logical, Available for non-windows operating system and specifies whether a fork solution should be used for the parallelisation. Default is TRUE for non-windows OS and FALSE for windows.

NOTE: Only use parallelSetting when you deal with a big dataset or large number of models otherwise, it make the procedure slower rather than faster if the procedure is quick without parallelising!

Value

an object of class sdmSettings

Author(s)

Babak Naimi naimi.b@gmail.com

https://www.r-gis.net/

https://www.biogeoinformatics.org/

References

Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, DOI: 10.1111/ecog.01881

Examples

## Not run: 
file <- system.file("external/pa_df.csv", package="sdm")

df <- read.csv(file)

head(df) 

d <- sdmData(sp~b15+NDVI,train=df)

# generate sdmSettings object:
s <- sdmSetting(sp~., methods=c('glm','gam','brt','svm','rf'),
          replication='sub',test.percent=30,n=10,modelSettings=list(brt=list(n.trees=500)))

s



## End(Not run)

babaknaimi/sdm documentation built on April 4, 2024, 1:45 p.m.