cross_valid: Cross validation of the generalised B-spline model

View source: R/cross_validation.R

cross_validR Documentation

Cross validation of the generalised B-spline model

Description

This function implements four different cross-validation techniques to evaluate the predictive ability of the generalised B-spline model (sensu Lagat et al., 2021b). The four different techniques implemented are:

  • validation set approach;

  • k -fold;

  • Leave-one-out-cross-validation (LOOCV), and

  • Repeated k-fold.

Usage

cross_valid(gbsm_obj, type = "k-fold", p, k, k_fold.repeats)

Arguments

gbsm_obj

An object of class "gbsm" (i.e., assigned to gbsm function).

type

The type of the cross-validation approach used. It must be \in \{"validation.set", "k-fold", "LOOCV", "repeated.k-fold" \}

p

The percentage (in decimal form) of data used in training the model. The value is used if the cross-validation approach implemented is "validation.set".

k

The value of k used in both "k-fold" and "repeated.k-fold" types of cross-validation. This value represents the number of subsets or groups that a given sample of data is to be split into. A value of 5 or 10 is used in practice, as it leads to an ideal bias-variance trade-off (Lagat et al., 2021b).

k_fold.repeats

The number of replicates used in "repeated.k-fold" type of cross-validation.

Details

The k-fold cross-validation approach is highly recommended due to its computational efficiency and an acceptable bias-variance trade-off, subject to the value of k chosen to be either 5 or 10 (Lagat et al., 2021b). For more details on the other cross-validation approaches, see Lagat et al. (2021c).

Value

Depending on the type of cross-validation approach implemented, the cross_valid function returns:

  • a data.frame with the following test errors (for "validation.set"):

    • RMSE:   A root mean squared error;

    • R_squared:   the Pearson's r^2, and

    • MAE:   the mean absolute error.

  • an array with test errors as above including the type of the regression model used, size of the samples, number of predictors, type of cross-validation performed, and summary of sample sizes (for "k-fold", "LOOCV", and "repeated.k-fold").

References

  1. Fushiki, T. (2011). Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21, 137-146. https://doi.org/10.1007/s11222-009-9153-8

  2. Lagat, V. K., Latombe, G. and Hui, C. (2021b). Dissecting the effects of random encounter versus functional trait mismatching on multi-species co-occurrence and interference with generalised B-spline modelling. DOI: ⁠<To be added>⁠.

  3. Lagat, V. K., Latombe, G. and Hui, C. (2021c). msco: an R software package for null model testing of multi-species interactions and interference with covariates. DOI: ⁠<To be added>⁠.

  4. Pearson, K. (1895) VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London, 58:240-242. https://doi.org/10.1098/rspl.1895.0041

Examples

## Not run: 

my.path <- system.file("extdata/gsmdat", package = "msco")
setwd(my.path)
s.data <- get(load("s.data.csv")) ## Species-by-site matrix
t.data <- get(load("t.data.csv")) ## Species-by-trait matrix
p.d.mat <- get(load("p.d.mat.csv")) ## Species-by-species phylogenetic distance matrix

gbsm_obj <- msco::gbsm(s.data, t.data, p.d.mat, metric= "Simpson_eqn", d.f=4,
 order.jo=3, degree=3, n=1000, b.plots=FALSE, scat.plot=FALSE,
  response.curves=FALSE, leg=1, max.vif, max.vif2, start.range=c(-0.1,0))

val.set <- msco::cross_valid(gbsm_obj, type="validation.set", p=0.8)
val.set

kfold <- msco::cross_valid(gbsm_obj, type="k-fold", k=5)
kfold

loocv <- msco::cross_valid(gbsm_obj, type="LOOCV")
loocv

repeated.kfold <- msco::cross_valid(gbsm_obj, type="repeated.k-fold", k=5, k_fold.repeats=100)
repeated.kfold


## End(Not run)

vitaliskim/msco documentation built on Sept. 29, 2023, 9:22 p.m.