pcss.core | R Documentation |
Generate a Core Collection with Principal Component Scoring Strategy (PCSS) \insertCitehamon_proposed_1990,noirot_principal_1996,noirot_method_2003rpcss using qualitative and/or quantitative trait data. \loadmathjax
pcss.core(
data,
names,
quantitative,
qualitative,
eigen.threshold = NULL,
size = 0.2,
var.threshold = 0.75
)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual/genotype names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
eigen.threshold |
The lower limit of the eigen value of factors to be included in the estimation. The default value is the average of all the eigen values. |
size |
The desired core set size proportion. |
var.threshold |
The desired proportion of total variability to be |
A core collection is constituted from an entire collection of \mjseqnN genotypes using quantitative data of \mjseqnJ traits using Principal Component Scoring Strategy (PCSS) \insertCitehamon_proposed_1990,noirot_principal_1996,noirot_method_2003rpcss as follows:
Principal Component Analysis (PCA) is performed on the standardized genotype \mjseqn\times trait data. This takes care of multicollinearity between the traits to generate \mjseqnJ standardized and independent variables or factors or principal component.
Considering only a subset of factors \mjseqnK, the Generalized Sum of Squares (GSS) of N individuals in K factorial spaces is computed as \mjseqnN \times K.
\mjseqnK can be the number of factors for which the eigen value \mjseqn\lambda is greater than a threshold value such as 1 (Kaiser-Guttman criterion) or the average of all the eigen values.
The contribution of the \mjseqnith genotype to GSS (\mjseqnP_i) or total variability is calculated as below.
\mjsdeqnP_i = \sum_j = 1^K x_ij^2
Where \mjseqnx_ij is the component score or coordinate of the \mjseqnith genotype on the \mjseqnjth principal component.
For each genotype, its relative contribution to GSS or total variability is computed as below.
\mjsdeqnCR_i = \fracP_iN \times K
The genotypes are sorted in descending order of magnitude of their contribution to GSS and then the cumulative contribution of successive genotypes to GSS is computed.
The core collection can then be selected by three different methods.
Selection of fixed proportion or percentage or number of the top accessions.
Selection of the top accessions that contribute up to a fixed percentage of the GSS.
Fitting a logistic regression model of the following form to the cumulative contribution of successive genotypes to GSS \insertCitebalakrishnan_method_2000rpcss.
\mjsdeqn\fracyA-y = e^a + bn
The above equation can be reparameterized as below.
\mjsdeqn\log_e \left ( \fracyA-y \right ) = a + bn
Where, \mjseqna and \mjseqnb are the intercept and regression coefficient, respectively; \mjseqny is the cumulative contribution of successive genotypes to GSS; \mjseqnn is the rank of the genotype when sorted according to the contribution to GSS and \mjseqnA is the asymptote of the curve (\mjseqnA = 100).
The rate of increase in the successive contribution of genotypes to GSS can be computed by the following equation to find the point of inflection where the rate of increase starts declining.
\mjseqn\frac\mathrmd y\mathrmd x = by(A-y)
The number of accessions included till the peak or infection point are selected to constitute the core collection.
Similarly for qualitative traits, standardized and independent variables or
factors can be obtained by Correspondence Analysis (CA) on complete
disjunctive table of genotype \mjseqn\times trait data or to be specific
Multiple Correspondence Analysis (MCA). In rpcss
, this has also been
extended for data sets having both quantitative and qualitative traits by
implementing Factor Analysis for Mixed Data (FAMD) for obtaining standardized
and independent variables or factors.
In rpcss
, PCA, MCA and FAMD are implemented via the
FactoMineR
package.
\insertCitele_FactoMineR_2008,husson_Exploratory_2017rpcss.
A list of class pcss.core
with the following components.
details |
The details of the core set generation process. |
raw.out |
The original output of |
eigen |
A data frame with eigen values and their partial and cumulative contribution to percentage of variance. |
eigen.threshold |
The threshold eigen value used. |
rotation |
A matrix of rotation values or loadings. |
scores |
A matrix of scores from PCA, CA or FAMD. |
variability.ret |
A data frame of individuals/genotypes ordered by variability retained. |
cores.info |
A data frame of core set size and percentage variability retained according to the method used. |
PCA
, CA
and
FAMD
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Prepare example data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
suppressPackageStartupMessages(library(EvaluateCore))
# Get data from EvaluateCore
data("cassava_EC", package = "EvaluateCore")
data = cbind(Genotypes = rownames(cassava_EC), cassava_EC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
"ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
"ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
"PSTR")
rownames(data) <- NULL
# Convert qualitative data columns to factor
data[, qual] <- lapply(data[, qual], as.factor)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
out1 <- pcss.core(data = data, names = "Genotypes",
quantitative = quant,
qualitative = NULL, eigen.threshold = NULL, size = 0.2,
var.threshold = 0.75)
out1
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
out2 <- pcss.core(data = data, names = "Genotypes", quantitative = NULL,
qualitative = qual, eigen.threshold = NULL,
size = 0.2, var.threshold = 0.75)
out2
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative and qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
out3 <- pcss.core(data = data, names = "Genotypes",
quantitative = quant,
qualitative = qual, eigen.threshold = NULL)
out3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.