View source: R/cross_validation.R
cross_valid | R Documentation |
This function implements four different cross-validation techniques to evaluate the predictive ability of the generalised B-spline model (sensu Lagat et al., 2021b). The four different techniques implemented are:
validation set approach;
k -fold;
Leave-one-out-cross-validation (LOOCV), and
Repeated k-fold.
cross_valid(gbsm_obj, type = "k-fold", p, k, k_fold.repeats)
gbsm_obj |
An object of |
type |
The type of the cross-validation approach used. It must be
|
p |
The percentage (in decimal form) of data used in training the model. The value is used if the cross-validation approach implemented is "validation.set". |
k |
The value of |
k_fold.repeats |
The number of replicates used in "repeated.k-fold" type of cross-validation. |
The k-fold cross-validation approach is highly recommended due to its computational
efficiency and an acceptable bias-variance trade-off, subject to the value of k
chosen to be either 5 or 10 (Lagat et al., 2021b). For more details on the other
cross-validation approaches, see Lagat et al. (2021c).
Depending on the type of cross-validation approach implemented, the cross_valid
function returns:
a data.frame
with the following test errors (for "validation.set"):
RMSE
: A root mean squared error;
R_squared
: the Pearson's r^2
, and
MAE
: the mean absolute error.
an array
with test errors as above including the type of the
regression model used, size of the samples, number of predictors,
type of cross-validation performed, and summary of sample sizes
(for "k-fold", "LOOCV", and "repeated.k-fold").
Fushiki, T. (2011). Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21, 137-146. https://doi.org/10.1007/s11222-009-9153-8
Lagat, V. K., Latombe, G. and Hui, C. (2021b). Dissecting the effects of random
encounter versus functional trait mismatching on multi-species co-occurrence and
interference with generalised B-spline modelling. DOI: <To be added>
.
Lagat, V. K., Latombe, G. and Hui, C. (2021c). msco
: an R software package
for null model testing of multi-species interactions and interference with
covariates. DOI: <To be added>
.
Pearson, K. (1895) VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London, 58:240-242. https://doi.org/10.1098/rspl.1895.0041
## Not run:
my.path <- system.file("extdata/gsmdat", package = "msco")
setwd(my.path)
s.data <- get(load("s.data.csv")) ## Species-by-site matrix
t.data <- get(load("t.data.csv")) ## Species-by-trait matrix
p.d.mat <- get(load("p.d.mat.csv")) ## Species-by-species phylogenetic distance matrix
gbsm_obj <- msco::gbsm(s.data, t.data, p.d.mat, metric= "Simpson_eqn", d.f=4,
order.jo=3, degree=3, n=1000, b.plots=FALSE, scat.plot=FALSE,
response.curves=FALSE, leg=1, max.vif, max.vif2, start.range=c(-0.1,0))
val.set <- msco::cross_valid(gbsm_obj, type="validation.set", p=0.8)
val.set
kfold <- msco::cross_valid(gbsm_obj, type="k-fold", k=5)
kfold
loocv <- msco::cross_valid(gbsm_obj, type="LOOCV")
loocv
repeated.kfold <- msco::cross_valid(gbsm_obj, type="repeated.k-fold", k=5, k_fold.repeats=100)
repeated.kfold
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.