Description Usage Arguments Details Value See Also
This function allows to build an ecological Diagnostic Tool (DT) that predicts an Impairment Probability for one or several anthropogenic pressures.
1 2 3 4 5 |
metrics |
a data frame with samples in rows and biological metrics in columns |
pressures |
a data frame with samples in rows and pressure information in columns (one per pressure category). The table is filled with quality classes (e.g. low or impaired) |
low, impaired |
character vectors with the labels of the pressure classes
(in |
pathDT |
character string, the path where the built models will be saved |
params |
a named list with the values, one or two (minimum, maximum) of the following parameters: - num.trees: Number of trees to grow; - mtry: Number of variables randomly sampled as candidates at each split; - sample.fraction: Proportion of samples to draw; - min.node.size: Minimum size of terminal nodes. |
CVfolds |
an integer indicating the number of parts made from the training data set and used to calibrate the model hyper-parameters. |
nIter |
an integer indicating the number of ranger RF models created for each pressure type. nIter larger than 1 allow to estimate prediction uncertainty and improve model robustness. |
nCores |
an integer indicating the number of CPU cores available to parallelize the calibration step |
trainingFrac |
a number between 0 and 1 indicating which propotion of the data set will be used to train the model |
samplingUnit |
a vector with a length equal to the number of rows of metrics and pressures indicating to which group each observation belongs to. The training and test data sets will be obtained by sampling these groups and not the observations (except if samplingUnit = NULL, the default). |
calibPopSize |
numeric. The size of the population used by the genetic algorithm used to calibrate the parameters. |
calibGenNb |
numeric. The number of generations used by the genetic algorithm used to calibrate the parameters. (calibGenNb + 1) * calibPopSize gives the total number of iterations performed by the calibration algorithm. |
seed |
numeric. The seed used for the random number generator |
The function takes as input two tables: one with a categorical description
(quality classes) of samples by one or several anthropogenic pressures
and
the second with the values of biological metrics
calculated from the
community data from the same samples.
For each pressure (i.e. column in the pressures
table), a model is built
and saved in the directory given by the pathDT
argument. The whole set of
models (DT units) saved in this directory constitute the DT.
Each DT unit is a probability random forest model built using the ranger function to predict the probability of a community being impaired by the pressure considered based on the biological metrics exhibited by the communities.
For each DT unit, the given metrics and pressures tables are splitted in training and test data sets. This is performed using the trainingFrac argument that specify the proportion of the data (once observations with missing pressure are removed) from each pressure level (low or impaired) that are used to constitute the training data (stratified sampling). By default, trainingFrac refers to the observations (rows of metrics and pressures) but if a grouping vector (e.g. site ID) is given to the argument samplingUnit, then this the training data set is built by sampling among samplingUnit and not among the rows. If a site has observations with different pressure levels (low or impaired), then the level occuring with highest frequency is allocated to the site.
The hyper-parameters of the ranger model are given in the
params argument that could accept one or several values per parameter. If
several values are given, an optimization procedure using
tuneParamsMultiCrit is performed to identify the
parameter set exhibiting the best trade-off between performance (AUC) and
execution time. Two optimization algorithms are implemented: a grid search
and a genetic optimization algorithm. If the argument calibGenNb
is larger
than one then the genetic algorithm is used and the space search for each
parameter is determined by the minimum and maximum values given in params
.
If calibGenNb
is smaller or equal to 1, then a grid search testing all the
params
value combinations is performed.
nothing, the models and used data are saved as .rda objects in the directory corresponding to the pathDT argument.
ranger tuneParamsMultiCrit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.