ensemble: Ensemble Forecasting of SDMs

ensembleR Documentation

Ensemble Forecasting of SDMs

Description

Make a Raster object with a weighted averaging over all predictions from several fitted model in a sdmModel object.

Usage

## S4 method for signature 'sdmModels'
ensemble(x, newdata, filename="",setting,overwrite=FALSE,pFilename="",...)

Arguments

x

a sdmModels object

newdata

raster object or data.frame, can be either predictors or the results of the predict function

filename

optional character, output file name (if newdata is raster object)

setting

list, contains the parameters that are used in the ensemble procedure; see details

overwrite

logical, whether existing filename is overwritten (if exists and filename is given)

pFilename

it is ignored if newdata is the output of predict, otherwise, since the ensemble first call predict, it specifies the filename to write the output of predict (if newdata is raster)

...

additional arguments pass to the writeRaster function (if used)

Details

ensemble function uses the fitted models in an sdmModels object to generate an ensemble/consensus of predictions by multiple individual models. Several ensemble methods are available and can be defined in the setting argument.

A list of settings can be introduced in the setting argument including:

- method: a character vector specifies which ensemble method(s) should be employed (multiple choice is possible). The details about the available methods are provided at the end of this page.

- stat: if the - method='weighted' is used, it specifies which evaluation metrics can be used as weight in the weighted averaging procedure. Alternatively, one may directly introduce weights (see the next argument).

- weights: an optional numeric vector (with a length equal to the models that are successfully fitted) to specify the weights for weighted averaging procedure (if the method='weighted' is specified).

- id: specifies the model IDs that should be considered in the ensemble procedure. If missing, all the models that are successfully fitted are considered.

- expr: A character or an expression specifies a condition to select models for the ensemble procedure. For example: expr='auc > 0.7' only use models with AUC accuracy greater than 0.7. OR expr='auc > 0.7 & tss > 0.5' subsets models based on both AUC and TSS metrics.

- wtest: specifies which test dataset ("training","test.dep","test.indep") should be used to extract the statistic (stat) values as weights (if a relevant method is specified)

- opt: if a thershold_based metric is used in is selected in stat or in expr, opt specifies the threshold selection criterion. The possible value can be between 1 to 14 for "sp=se", "max(se+sp)", "min(cost)", "minROCdist", "max(kappa)", "max(ppv+npv)", "ppv=npv", "max(NMI)", "max(ccr)", "prevalence", "P10", "P5", "P1", "P0" criteria, respectively.

- power: default: 1, a numeric value to which the weights are raised. Greater value than 1 affects weighting scheme (for the methods e.g., "weighted") to increase the weights for the models with greater weight. For example, if weights are c(0.2,0.2,0.2,0.4), raising them to power 2 would be resulted to new weights as c(0.1428571,0.1428571, 0.1428571, 0.5714286) that causes greater contribution of the models with greater performances to the ensemble output.

—> The available ensemble methods (to be specified in method) include:

– 'unweighted': unweighted averaging/mean.

– 'weighted': weighted averaging.

– 'median': median.

– 'pa': mean of predicted presence-absence values (predicted probabilities are first converted to presence-absence given a threshold (opt defines which threshold optimisation strategy should be used), then they are averaged).

– 'mean-weighted': A two step averaging, that can be used when several replications are available for each modelling methods (e.g., fitted through bootstrapping or cross-validation resampling); it first takes an unweighted mean over the predicted values of multiple replications for each method (within model averaging), then a weighted mean is employed to combine the probabilities of different methods (between models averaging).

– 'mean-unweighted': Same as the previous one, but an unweighted mean is also used for the second step (instead of weighted mean).

– 'median-weighted': Same as the 'mean-weighted, but the median is used in the first step.

– 'median-unweighted': another two-step method, median is used for the first step and unweighted mean is used for the second step.

—-> in addition to tne ensemble methods, some other methods are available to generate some outputs that can represent uncertainty:

– 'uncertainty' or 'entropy': this method generates the uncertainty among the models' predictions that can be interpreted as model-based uncertainty or inconsistency among different models. It ranges between 0 and 1, 0 means all the models predicted the same value (either presence or absence), and 1 referes to maximum uncertainy, e.g., half of the models predicted presence (or absence) and the other half predicted the oposite value.

– 'cv': Coefficient of variation of probabilities generated from multiple models

– 'stdev': Standard deviation of probabilities generated from multiple models

– 'ci': This generates confidence interval length (marginal error) which assigns the difference between upper and lower limits of confidence interval to each pixel (upper - lower). The default level of confidence interval is 95% (i.e., alpha = 0.05), unless a different alpha is defined in setting. In case two separate upper and lower rasters are needed, by using the following codes, the upper and lower limits can be calculated:

en <- ensemble(x, newdata, setting=list(method=c('mean','ci'))) # taking unweighted averaging and ci

# en[[1]] is the mean of all probabilities and en[[2]] is the ci ci.upper <- en[[1]] + en[[2]] / 2 # adding marginal error (half of the generated ci) to mean ci.lower <- en[[1]] - en[[2]] / 2 # subtracting marginal error from mean

plot(ci.upper,main='Upper limit of Confidence Interval - alpha = 0.05')

plot(ci.lower,main='Lower limit of Confidence Interval - alpha = 0.05')

Value

- a Raster object if predictors is a Raster object

- a numeric vector (or a data.frame) if predictors is a data.frame object

Author(s)

Babak Naimi naimi.b@gmail.com

https://www.r-gis.net/

https://www.biogeoinformatics.org/

References

Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, 39:368-375, DOI: 10.1111/ecog.01881

See Also

#

Examples

## Not run: 


file <- system.file("external/species.shp", package="sdm") # get the location of the species data

species <- vect(file) # read the shapefile

path <- system.file("external", package="sdm") # path to the folder contains the data

lst <- list.files(path=path,pattern='asc$',full.names = T) # list the name of the raster files 


# stack is a function in the raster package, to read/create a multi-layers raster dataset
preds <- rast(lst) # making a raster object

d <- sdmData(formula=Occurrence~., train=species, predictors=preds)

d

# fit the models (5 methods, and 10 replications using bootstrapping procedure):
m <- sdm(Occurrence~.,data=d,methods=c('rf','tree','fda','mars','svm'),
          replicatin='boot',n=10)
    
# ensemble using weighted averaging based on AUC statistic:    
p1 <- ensemble(m, newdata=preds, filename='ens.img',setting=list(method='weighted',stat='AUC'))
plot(p1)

# ensemble using weighted averaging based on TSS statistic
# and optimum threshold critesion 2 (i.e., Max(spe+sen)) :    
p2 <- ensemble(m, newdata=preds, filename='ens2.img',setting=list(method='weighted',
                                                                  stat='TSS',opt=2))
plot(p2)


## End(Not run)



babaknaimi/sdm documentation built on April 4, 2024, 1:45 p.m.