knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(raster) library(viridisLite)
Running ERFs on a given dataset is easy. The function ens_random_forests()
will take a given dataset in R data.frame
format, amend it for modeling using erf_data_prep()
and erf_formula_prep()
, run each RF in the ensemble using rf_ens_fn()
, and return a fitted ERF object. This object can then be passed to various output functions: erf_plotter()
and ... to visualize and summarize.
First, we must load the R library.
library(EnsembleRandomForests)
The provided dataset is a list
object that contains a data.frame
of the sampled locations, the beta coefficients of the logistic model used to predict the probability of occurrence, and a raster
brick
object containing the gridded covariates, log-odds of occurrence, and probabilities of occurrence.
# We can also visualize the covariates par(mar=c(0,0.5,2,0.5), oma=c(1,1,1,1)) layout(matrix(c(1,1,2,2,3,3,0,4,4,5,5,0),2,6,byrow=TRUE)) r <- range(cellStats(simData$grid[[1:5]],'range')) for(i in 1:5){ image(simData$grid[[i]], col=inferno(100), zlim = r, xaxt='n', yaxt='n', xlab="", ylab="") title(paste0('Covariate ', i)) }
We can also see the beta coefficients that produced the probability of presence using the model below: $$\begin{equation} log\left[\frac{\hat{P}{obs=1}}{1-\hat{P}{obs=1}}\right] = \alpha + \beta_1X_1 + ... +\beta_nX_n \end{equation}$$
print(round(simData$betas,3)) # We can visualize the log-odds and the probability of presence par(mar=c(0,0.5,2,0.5), oma=c(1,1,1,1), mfrow=c(1,2)) image(simData$grid[[6]], col=inferno(100), xaxt='n', yaxt='n', xlab="", ylab="") title("Log-odds") image(simData$grid[[7]], col=viridis(100), xaxt='n', yaxt='n', xlab="", ylab="") with(simData$samples[simData$samples$obs==1,], points(x,y,pch=16,col='white')) title("Probability of Presence")
Now that we have covered the datasets, let's run an ERF. This is simple using ens_random_forests
.
ens_rf_ex <- ens_random_forests(df=simData$samples, var="obs", covariates=grep("cov",colnames(simData$samples),value=T), header = NULL, save=FALSE, out.folder=NULL, duplicate = TRUE, n.forests = 10L, importance = TRUE, ntree = 1000, mtry = 5, var.q = c(0.1,0.5,0.9), cores = parallel::detectCores()-2)
The arguments to ens_random_forests
are:
df
: this is the data.frame containing the presences/absences and the covariatesvar
: this is the column name of the presence/absencecovariates
: these are the column names of the covariates to use. Here, we grabbed anything with "cov" in the column nameheader
: these are additional column names you may wish to append to data.frame produced internallysave
: this is a logical whether to save the model to the working directory or an optional out.folder
directoryduplicate
: a logical flag to control whether to duplicate observations with more than one presence. n.forests
: this controls the number of forests to generate in the ensemble. See the optimization vignette for more information on tuning this parameter.importance
: a logical flag to calculate variable importance or notntree
: number of trees in each Random Forests in the ensemblemtry
: number of covariates to try at each node in each tree in each Random Forests in the ensemblevar.q
: quantiles for the distribution of the variable importance; only exectuted if importance=TRUEcores
: how many cores to run the model on.We can look at some of the output produced by the random forests (see help(ens_random_forests)
for a full list):
#view the dataset used in the model head(ens_rf_ex$data) #view the ensemble model predictions head(ens_rf_ex$ens.pred) #view the threshold-free ensemble performance metrics unlist(ens_rf_ex$ens.perf[c('auc','rmse','tss')]) #view the mean test threshold-free performance metrics for each RF ens_rf_ex$mu.te.perf #structure of the individual model predictions str(ens_rf_ex$pred)
As we can see, the ensemble performs better than the mean test predictions. This is advantage of ERF over other RF modifications for extreme class imbalance. Siders et al. 2020 discusses the various performance of these other modifications if you are curious.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.