library(knitr)
knitr::opts_chunk$set(fig.align="center")

Preamble

Load the rgs3 package (available here):

suppressPackageStartupMessages(library(rgs3))
packageVersion("rgs3")

Choose the number of cores to run cross validation in parallel (if available):

suppressPackageStartupMessages(library(parallel))
nb.cores <- max(detectCores() - 1, 1)

Then, read first the vignette intro-rgs3 before this one.

This R chunk is used to assess how much time it takes to execute the R code in this document until the end:

t0 <- proc.time()

Data and model

This vignette uses the same data as the intro-rgs3 vignette.

Load data into R

Load phenotypes:

phenos.file <- system.file("extdata", "phenos_df.txt.gz", package="rgs3")
tools::md5sum(path.expand(phenos.file))
phenos <- read.table(phenos.file, header=TRUE)
phenos$year <- as.factor(phenos$year)
str(phenos)

Load genotypes:

genos.file <- system.file("extdata", "genos_mat.txt.gz", package="rgs3")
tools::md5sum(path.expand(genos.file))
genos <- as.matrix(read.table(genos.file))
dim(genos)

Set up the configuration

Let us choose an identifier for our analysis, which will be used for all files, so that it will be easy to remove them at the end:

task.id <- "task-execGS3_crossval-rgs3"

Use the utility function:

ptl <- data.frame(position = c(which(colnames(phenos) == "year"),
                               ncol(phenos) + 1),
                  type = c("cross",
                           "add_SNP"),
                  nlevels = c(length(levels(phenos$year)),
                              0),
                  stringsAsFactors = FALSE)
config <- getDefaultConfig(
    nb.snps = ncol(genos),
    rec.id = which(colnames(phenos) == "geno"),
    twc = c(which(colnames(phenos) == "response1"), 0),
    method = "VCE",
    ptl = ptl,
    task.id = task.id)

Customize the rest:

vc <- data.frame(var = c("vara","vard","varg","varp", "vare"),
                 exp = c("0.025", "0", "0", "0", "1"),
                 df = rep("2", 5),
                 stringsAsFactors = FALSE)
config$vc <- vc
config$niter <- 2000
config$burnin <- 200
config$thin <- 2

Note that the number of iterations here is voluntarily low, so that the vignette doesn't take too long to run.

Perform K-fold cross-validation

Let's use 5 folds for this vignette:

results.cv <- crossValWithGs3(
    genos = genos,
    dat = phenos,
    config = config,
    task.id = task.id,
    nb.folds = 5,
    seed = 123,
    remove.files = "all",
    nb.cores = nb.cores,
    verbose = 2)

Assess the results

results.cv
apply(results.cv[,c("rmspe","cor.p","cor.p.div.h","reg.intercept","reg.slope")],
      2, function(x){
        c(mean=mean(x), sd=sd(x))
      })

It is also possible to have multiple replicates, using the function crossValRepWithGs3.

Acknowledgments

Appendix

t1 <- proc.time(); t1 - t0
print(sessionInfo(), locale=FALSE)


INRA/rgs3 documentation built on May 20, 2019, 5:24 p.m.