Description Usage Arguments Details Value References Examples
Cross-validation function used in combination with BGLR
1 2 3 4 5 6 | crossValidate(x, id = "GERMPLASM", factor = "LOCATION", k = 5,
replication = 3, seed = NULL, exclusive = TRUE,
sampling = c("randomByID", "randomAccrossFactor", "randomByFactor",
"randomWithinFactor", "popStructureAccrossFactor", "popStructureWithinFactor",
"commit", "incompleteTrial"), trainingSet = NULL, validationSet = NULL,
populationStructure = NULL, verbose = FALSE)
|
x |
a data frame with at least the following information:
|
id |
character specifying the column name of the entries IDs in x. Default is GERMPLASM. |
factor |
character specifying the column name of the factor to use in the cross-validation in x. Default is LOCATION, refering to the graphical locations in considering a multi-location field trial. |
k |
integer defining the number of folds for k-fold cross validation, thus k should be in [2,nrow(y)], where y is the vector of phenotypic values. The default is 5. |
replication |
numeric defining the number of replications of the cross-validation. Default is 3. |
seed |
numeric value for the seed value used for the randomization by the set.seed function. In this way randomization can be reproduced by the user. Default is NULL, which uses 123 as value for the seed. |
exclusive |
logical whether sampling should be done with replacement. The argument is passed to the replace argument of the samp.int function as the negation, i.e. exclusive is TRUE means replace=FALSE, such that the probability of choosing the next item is proportional to the weights amongst the remaining items. |
sampling |
character specifying which sampling strategy to use in the cross-validation. The different sampling strategies are described below:
If sampling is "commit" the sets of names have to specified in the trainingSet and validationSet arguments. |
trainingSet |
character vector of the observations in the training set. |
validationSet |
character vector of the observations in the specified test set. |
populationStructure |
vector of length nrow(y) assigning individuals to a population structure, where y refers to the vector of phenotypes. This argument is only required for the options sampling="popStructureAccrossFactor" or sampling="popStructureWithinFactor". |
verbose |
logical whether to output information about the progress of the cross-validation. Default is FALSE. |
in cross validation (CV) the data set is splitted into a training set, and a validation or test set. For sampling into the sets, k-fold cross validation is applied, where the data set is splitted into k subsets and k-1 comprising the training set and 1 is the test set, repeated for each subset. The function is based on the crossVal function from the synbreed package. We made the function more flexible by taking out the cross-validation schemes functionality, to allow easy plug-in of more user-defined CV schemes. Further, the function was adjusted to work with the BGLR framework.
data frame with the result of the sampling of the entries into k-folds using a number of user-defined replications. The table includes following columns:
IDThe names of the observations.
Rep[x][x] columns of numeric scores according to the assignment of the observations into 1...k folds, where [x] is set by the replication argument
1
:Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schoen CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339-350.
2
:Gustavo de los Campos and Paulino Perez Rodriguez (2014). BGLR: Bayesian Generalized Linear Regression. R package version 1.0.3. http://CRAN.R-project.org/package=BGLR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | data(exampleCV)
scheme1 <- crossValidate(x=exampleCV, id="GERMPLASM", factor="LOCATION",
k=5, replication=3, seed=NULL, exclusive=TRUE,
sampling="randomByFactor",verbose=TRUE)
scheme2 <- crossValidate(x=exampleCV, id="GERMPLASM", factor="LOCATION",
k=5, replication=3, seed=NULL, exclusive=TRUE,
sampling="incompleteTrial",verbose=TRUE)
scheme3 <- crossValidate(x=exampleCV, id="GERMPLASM", factor="LOCATION",
k=5, replication=3, seed=NULL, exclusive=TRUE,
sampling="randomAccrossFactor",verbose=TRUE)
scheme4 <- crossValidate(x=exampleCV, id="GERMPLASM", factor="LOCATION",
k=5, replication=3, seed=NULL, exclusive=TRUE,
sampling="randomWithinFactor",verbose=TRUE)
scheme5 <- crossValidate(x=exampleCV, id="GERMPLASM", factor="LOCATION",
k=5, replication=3, seed=NULL, exclusive=TRUE,
sampling="randomByID",verbose=TRUE)
head(scheme1)
head(scheme2)
head(scheme3)
head(scheme4)
head(scheme5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.