Description Usage Arguments Details Value See Also

This function allows to build an ecological Diagnostic Tool (DT) that predicts an Impairment Probability for one or several anthropogenic pressures.

1 2 3 4 5 |

`metrics` |
a data frame with samples in rows and biological metrics in columns |

`pressures` |
a data frame with samples in rows and pressure information in columns (one per pressure category). The table is filled with quality classes (e.g. low or impaired) |

`low, impaired` |
character vectors with the labels of the pressure classes
(in |

`pathDT` |
character string, the path where the built models will be saved |

`params` |
a named list with the values, one or two (minimum, maximum) of the following parameters: - num.trees: Number of trees to grow; - mtry: Number of variables randomly sampled as candidates at each split; - sample.fraction: Proportion of samples to draw; - min.node.size: Minimum size of terminal nodes. |

`CVfolds` |
an integer indicating the number of parts made from the training data set and used to calibrate the model hyper-parameters. |

`nIter` |
an integer indicating the number of ranger RF models created for each pressure type. nIter larger than 1 allow to estimate prediction uncertainty and improve model robustness. |

`nCores` |
an integer indicating the number of CPU cores available to parallelize the calibration step |

`trainingFrac` |
a number between 0 and 1 indicating which propotion of the data set will be used to train the model |

`samplingUnit` |
a vector with a length equal to the number of rows of metrics and pressures indicating to which group each observation belongs to. The training and test data sets will be obtained by sampling these groups and not the observations (except if samplingUnit = NULL, the default). |

`calibPopSize` |
numeric. The size of the population used by the genetic algorithm used to calibrate the parameters. |

`calibGenNb` |
numeric. The number of generations used by the genetic algorithm used to calibrate the parameters. (calibGenNb + 1) * calibPopSize gives the total number of iterations performed by the calibration algorithm. |

`seed` |
numeric. The seed used for the random number generator |

The function takes as input two tables: one with a categorical description
(quality classes) of samples by one or several anthropogenic `pressures`

and
the second with the values of biological `metrics`

calculated from the
community data from the same samples.

For each pressure (i.e. column in the `pressures`

table), a model is built
and saved in the directory given by the `pathDT`

argument. The whole set of
models (DT units) saved in this directory constitute the DT.

Each DT unit is a probability random forest model built using the ranger function to predict the probability of a community being impaired by the pressure considered based on the biological metrics exhibited by the communities.

For each DT unit, the given metrics and pressures tables are splitted in training and test data sets. This is performed using the trainingFrac argument that specify the proportion of the data (once observations with missing pressure are removed) from each pressure level (low or impaired) that are used to constitute the training data (stratified sampling). By default, trainingFrac refers to the observations (rows of metrics and pressures) but if a grouping vector (e.g. site ID) is given to the argument samplingUnit, then this the training data set is built by sampling among samplingUnit and not among the rows. If a site has observations with different pressure levels (low or impaired), then the level occuring with highest frequency is allocated to the site.

The hyper-parameters of the ranger model are given in the
params argument that could accept one or several values per parameter. If
several values are given, an optimization procedure using
tuneParamsMultiCrit is performed to identify the
parameter set exhibiting the best trade-off between performance (AUC) and
execution time. Two optimization algorithms are implemented: a grid search
and a genetic optimization algorithm. If the argument `calibGenNb`

is larger
than one then the genetic algorithm is used and the space search for each
parameter is determined by the minimum and maximum values given in `params`

.
If `calibGenNb`

is smaller or equal to 1, then a grid search testing all the
`params`

value combinations is performed.

nothing, the models and used data are saved as .rda objects in the directory corresponding to the pathDT argument.

ranger tuneParamsMultiCrit

CedricMondy/ecodiag documentation built on Nov. 7, 2018, 2:30 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.