# Regression with a functional covariate: Cross-validation analysis of the Tecator data set In iprior: Regression Modelling using I-Priors

```library(iprior)
```

Continuing on Section 3.4 of the vignette, we revisit the code used to obtain out-of-sample test error rates, and extend the analysis to a leave-one-out cross-validation (LOOCV) scheme.

# Easy train/test split

The `iprior()` function includes an argument to conveniently instruct which data samples should be used for training, and any remaining data used for testing. The out-of-sample test error rates would then be reported together. The examples in the vignette can then be conducted as follows:

```data(tecator, package = "caret")
fat <- endpoints[, 2]
absorp <- -t(diff(t(absorp)))  # take first differences
mod1 <- iprior(fat, absorp, train.samp = 1:172, method = "mixed")
```

The prediction error (training and test) can then be obtained easily:

```get_prederror(mod1)
```

# LOOCV experiment

With the above conveniences, it is easy to wrap this in loop to perform \$k\$-fold cross-validation; this is done in the `iprior_cv()` function. We now analyse the predictive performance of I-prior models using a LOOCV scheme. For all `n=215` samples, one observation pair is left out and the model trained; the prediction error is obtained for the observation that was left out. This is repeated for all `n=215` samples, and the average of the prediction errors calculated.

For the linear RKHS, the code to peform the LOOCV in the `iprior` package is as follows:

```(mod1.cv <- iprior_cv(fat, absorp, method = "em",
control = list(stop.crit = 1e-2), folds = Inf))
```
```data(tecator.cv, package = "iprior")
print(tecator.cv[[1]], "RMSE")
```

Notice the argument `folds = Inf`---since the `iprior_cv()` function basically performs a \$k\$-fold cross validation experiment, setting `folds` to be equal to sample size or higher tells the function to perform LOOCV. Also note that the EM algorithm was used to fit the model, and the stopping criterion relaxed to `1e-2`---this offered faster convergence without affecting predictive abilities. The resulting fit gives training and test mean squared error (MSE) for the cross-validation experiment.

The rest of the code for the remaining models are given below. As this takes quite a long time to run, this has been run locally and the results saved into the data `tecator.cv` within the `iprior` package.

```mod2.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "poly2",
est.offset = TRUE, control = list(stop.crit = 1e-2))
mod3.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "poly3",
est.offset = TRUE, control = list(stop.crit = 1e-2))
mod4.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "fbm",
control = list(stop.crit = 1e-2))
mod5.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "fbm",
est.hurst = TRUE, control = list(stop.crit = 1e-2))
mod6.cv <- iprior_cv(fat, absorp, method = "em", folds = Inf, kernel = "se",
est.lengthscale = TRUE, control = list(stop.crit = 1e-2))

# Function to tabulate the results
tecator_tab_cv <- function() {
tab <- t(sapply(list(mod1.cv, mod2.cv, mod3.cv, mod4.cv, mod5.cv, mod6.cv),
function(mod) {
res <- as.numeric(apply(sqrt(mod\$mse[, -1]), 2, mean))
c("Training MSE" = res[1], "Test MSE" = res[2])
}))
rownames(tab) <- c("Linear", "Quadratic", "Cubic", "fBm-0.5", "fBm-MLE",
"SE-MLE")
tab
}
```

The results are tabulated below.

```knitr::kable(iprior::dec_plac(tecator.cv[[7]], 2), align = "r", caption = "Results for the LOOCV experiment for various I-prior models.")
```

## Try the iprior package in your browser

Any scripts or data that you put into this service are public.

iprior documentation built on Oct. 18, 2018, 1:18 a.m.