In INRA/rgs3: R package wrapping the GS3 program for genomic selection

library(knitr)
knitr::opts_chunk$set(fig.align="center")

Preamble

Load the rgs3 package (available here):

suppressPackageStartupMessages(library(rgs3))
packageVersion("rgs3")

Choose the number of cores to run cross validation in parallel (if available):

suppressPackageStartupMessages(library(parallel))
nb.cores <- max(detectCores() - 1, 1)

Then, read first the vignette intro-rgs3 before this one.

This R chunk is used to assess how much time it takes to execute the R code in this document until the end:

t0 <- proc.time()

Data and model

This vignette uses the same data as the intro-rgs3 vignette.

Load data into R

Load phenotypes:

phenos.file <- system.file("extdata", "phenos_df.txt.gz", package="rgs3")
tools::md5sum(path.expand(phenos.file))
phenos <- read.table(phenos.file, header=TRUE)
phenos$year <- as.factor(phenos$year)
str(phenos)

Load genotypes:

genos.file <- system.file("extdata", "genos_mat.txt.gz", package="rgs3")
tools::md5sum(path.expand(genos.file))
genos <- as.matrix(read.table(genos.file))
dim(genos)

Set up the configuration

Let us choose an identifier for our analysis, which will be used for all files, so that it will be easy to remove them at the end:

task.id <- "task-execGS3_crossval-rgs3"

Use the utility function:

ptl <- data.frame(position = c(which(colnames(phenos) == "year"),
                               ncol(phenos) + 1),
                  type = c("cross",
                           "add_SNP"),
                  nlevels = c(length(levels(phenos$year)),
                              0),
                  stringsAsFactors = FALSE)
config <- getDefaultConfig(
    nb.snps = ncol(genos),
    rec.id = which(colnames(phenos) == "geno"),
    twc = c(which(colnames(phenos) == "response1"), 0),
    method = "VCE",
    ptl = ptl,
    task.id = task.id)

Customize the rest:

vc <- data.frame(var = c("vara","vard","varg","varp", "vare"),
                 exp = c("0.025", "0", "0", "0", "1"),
                 df = rep("2", 5),
                 stringsAsFactors = FALSE)
config$vc <- vc
config$niter <- 2000
config$burnin <- 200
config$thin <- 2

Note that the number of iterations here is voluntarily low, so that the vignette doesn't take too long to run.

Perform K-fold cross-validation

Let's use 5 folds for this vignette:

results.cv <- crossValWithGs3(
    genos = genos,
    dat = phenos,
    config = config,
    task.id = task.id,
    nb.folds = 5,
    seed = 123,
    remove.files = "all",
    nb.cores = nb.cores,
    verbose = 2)

Assess the results

results.cv
apply(results.cv[,c("rmspe","cor.p","cor.p.div.h","reg.intercept","reg.slope")],
      2, function(x){
        c(mean=mean(x), sd=sd(x))
      })

It is also possible to have multiple replicates, using the function crossValRepWithGs3.

Acknowledgments

Hélène Muranty

Appendix

t1 <- proc.time(); t1 - t0
print(sessionInfo(), locale=FALSE)

INRA/rgs3 documentation built on May 20, 2019, 5:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

INRA/rgs3
R package wrapping the GS3 program for genomic selection

In INRA/rgs3: R package wrapping the GS3 program for genomic selection

Preamble

Data and model

Load data into R

Set up the configuration

Perform K-fold cross-validation

Assess the results

Acknowledgments

Appendix

R Package Documentation

Browse R Packages

We want your feedback!

INRA/rgs3 R package wrapping the GS3 program for genomic selection

In INRA/rgs3: R package wrapping the GS3 program for genomic selection

Preamble

Data and model

Load data into R

Set up the configuration

Perform K-fold cross-validation

Assess the results

Acknowledgments

Appendix

R Package Documentation

Browse R Packages

We want your feedback!

INRA/rgs3
R package wrapping the GS3 program for genomic selection