kFoldCV: Function for conducting k-fold cross-validation

View source: R/kFoldCV.R

kFoldCVR Documentation

Function for conducting k-fold cross-validation

Description

kFoldCV conducts a k-fold cross-validation for parametric and smooth land use regression (LUR) models fitted with the functions parLUR and smoothLUR, respectively.

Usage

kFoldCV(
  data,
  x,
  ID,
  spVar1,
  spVar2,
  y,
  dirEff,
  thresh = 0.95,
  seed = 42,
  k = 10,
  strat = FALSE,
  indRegions = NULL,
  loocv = FALSE
)

Arguments

data

A data set which contains the dependent variable and the potential predictors.

x

A character vector stating the variable names of the potential predictors (names have to match the column names of 'data').

ID

A character string stating the variable name referring to the monitoring sites' ID (name has to mach the column name of 'data').

spVar1

A character string stating the variable name referring to longitude (name has to match the column name of 'data').

spVar2

A character string stating the variable name referring to latitude (name has to match the column name of 'data').

y

A character string that indicates the name of the dependent variable (name has to match the column name of 'data').

dirEff

A vector that contains one entry for each potential predictor and indicates the expected direction of the effect of the potential predictor (1 for positive, -1 for negative and 0 if the expected effect sign is unclear). Argument defaults to NULL and is only required for parametric model fitting.

thresh

A numeric value that indicates the maximum share of zero values; if the share is exceeded, the corresponding potential predictor is excluded.

seed

A numeric value that defines the seed for random sampling (defaults to 42).

k

An integer denoting the number of folds to use in cross-validation (defaults to 10).

strat

A boolean value that indicates whether stratified sampling is desired (stratified spatially w.r.t. German federal states).

indRegions

A character string that indicates the name of the variable referring to the geographic regions; this variable is required to perform spatial stratified sampling; defaults to NULL.

loocv

A boolean value that indicates whether a leave-one-out cross-validation which is a k-fold CV with 'k' equal to the number of rows in 'data' desired.

Value

An object of class 'kfcvLUR' with the following elements:

df.err

data.frame with four columns: ID (Id of monitoring site), Fold (Number of fold the monitoring site is attributed to), Err.par (Errors derived from parametric LUR model), Err.smooth (Errors derived from smooth LUR model)

ls.models

list with elements according to 'k'; each list element is named according to the omitted fold and is itself a list containing two elements: mod.par (parametric model based on remaining sites), mod.smooth (smooth model based on remaining sites)

It has '...', '...', and '...' methods.

Author(s)

Svenia Behm and Markus Fritsch

See Also

parLUR for parametric land use regression (LUR) modeling. smoothLUR for smooth land use regression (LUR) modeling.

Examples

## Load data set
data(monSitesDE, package="smoothLUR")


markusfritsch/smoothLUR documentation built on Nov. 5, 2022, 3:42 p.m.