Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/cross_validate.R

Cross-validate one or multiple gaussian or binomial models at once. Perform repeated cross-validation. Returns results in a tibble for easy comparison, reporting and further analysis.

See `cross_validate_fn()`

for use
with custom model functions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |

`data` |
Data frame. Must include grouping factor for identifying folds
- as made with |

`models` |
Model formulas as strings. (Character) E.g. Can contain random effects. E.g. |

`fold_cols` |
Name(s) of grouping factor(s) for identifying folds. (Character) Include names of multiple grouping factors for repeated cross-validation. |

`family` |
Name of family. (Character) Currently supports |

`link` |
Link function. (Character) E.g. See ## Default link functionsGaussian: Binomial: |

`control` |
Construct control structures for mixed model fitting
(i.e. N.B. Ignored if fitting |

`REML` |
Restricted Maximum Likelihood. (Logical) |

`cutoff` |
Threshold for predicted classes. (Numeric) N.B. |

`positive` |
Level from dependent variable to predict.
Either as character or level index ( E.g. if we have the levels Used when calculating confusion matrix metrics and creating ROC curves. N.B. Only affects evaluation metrics, not the model training or returned predictions. N.B. |

`metrics` |
List for enabling/disabling metrics. E.g. Also accepts the string N.B. Currently, disabled metrics are still computed. |

`rm_nc` |
Remove non-converged models from output. (Logical) |

`parallel` |
Whether to cross-validate the list of models in parallel. (Logical) Remember to register a parallel backend first.
E.g. with |

`model_verbose` |
Message name of used model function on each iteration. (Logical) |

Packages used:

Gaussian: stats::lm, `lme4::lmer`

Binomial: `stats::glm`

, `lme4::glmer`

r2m : `MuMIn::r.squaredGLMM`

r2c : `MuMIn::r.squaredGLMM`

AIC : `stats::AIC`

AICc : `MuMIn::AICc`

BIC : `stats::BIC`

Confusion matrix: `caret::confusionMatrix`

ROC: `pROC::roc`

MCC: `mltools::mcc`

Tbl (tibble) with results for each model.

A nested tibble with **coefficients** of the models from all iterations.

Number of *total* **folds**.

Number of **fold columns**.

Count of **convergence warnings**. Consider discarding models that did not converge on all
iterations. Note: you might still see results, but these should be taken with a grain of salt!

Count of **other warnings**. These are warnings without keywords such as "convergence".

Count of **Singular Fit messages**. See `?lme4::isSingular`

for more information.

Nested tibble with the **warnings and messages** caught for each model.

Specified **family**.

Specified **link** function.

Name of **dependent** variable.

Names of **fixed** effects.

Names of **random** effects, if any.

—————————————————————-

—————————————————————-

Average **RMSE**, **MAE**, **r2m**, **r2c**, **AIC**, **AICc**,
and **BIC** of all the iterations*,
* omitting potential NAs from non-converged iterations*.
Note that the Information Criteria metrics (AIC, AICc, and BIC) are also averages.

A nested tibble with the **predictions** and targets.

A nested tibble with the non-averaged **results** from all iterations.

* In *repeated cross-validation*,
the metrics are first averaged for each fold column (repetition) and then averaged again.

—————————————————————-

—————————————————————-

Based on the **collected** predictions from the test folds*,
a confusion matrix and a ROC curve are created to get the following:

ROC:

**AUC**, **Lower CI**, and **Upper CI**

Confusion Matrix:

**Balanced Accuracy**, **F1**,
**Sensitivity**, **Specificity**,
**Positive Prediction Value**,
**Negative Prediction Value**,
**Kappa**,
**Detection Rate**,
**Detection Prevalence**,
**Prevalence**, and
**MCC** (Matthews correlation coefficient).

Other available metrics (disabled by default, see `metrics`

):
**Accuracy**.

Also includes:

A nested tibble with **predictions**, predicted classes (depends on `cutoff`

), and the targets.
Note, that the **predictions are not necessarily of the specified positive class**, but of
the model's positive class (second level of dependent variable, alphabetically).

A nested tibble with the sensativities and specificities from the **ROC** curve(s).

A nested tibble with the **confusion matrix**/matrices.
The `Pos_`

columns tells you whether a row is a
True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN),
depending on which level is the "positive" class. I.e. the level you wish to predict.

A nested tibble with the **results** from all fold columns, when using *repeated cross-validation*.

* In *repeated cross-validation*, an evaluation is made per fold column (repetition) and averaged.

Ludvig Renbo Olsen, [email protected]

Benjamin Hugh Zachariae

Other validation functions:
`cross_validate_fn()`

,
`validate()`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ```
# Attach packages
library(cvms)
library(groupdata2) # fold()
library(dplyr) # %>% arrange()
# Data is part of cvms
data <- participant.scores
# Set seed for reproducibility
set.seed(7)
# Fold data
data <- fold(data, k = 4,
cat_col = 'diagnosis',
id_col = 'participant') %>%
arrange(.folds)
# Cross-validate a single model
# Gaussian
cross_validate(data,
models = "score~diagnosis",
family = 'gaussian',
REML = FALSE)
# Binomial
cross_validate(data,
models = "diagnosis~score",
family='binomial')
# Cross-validate multiple models
models <- c("score~diagnosis+(1|session)",
"score~age+(1|session)")
cross_validate(data,
models = models,
family = 'gaussian',
REML = FALSE)
# Use non-default link functions
cross_validate(data,
models = "score~diagnosis",
family = 'gaussian',
link = 'log',
REML = FALSE)
# Use parallelization
# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)
# Create list of 20 model formulas
models <- rep(c("score~diagnosis+(1|session)",
"score~age+(1|session)"), 10)
# Cross-validate a list of 20 model formulas in parallel
system.time({cross_validate(data,
models = models,
family = 'gaussian',
parallel = TRUE)})
# Cross-validate a list of 20 model formulas sequentially
system.time({cross_validate(data,
models = models,
family = 'gaussian',
parallel = FALSE)})
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.