pcss.core: Principal Component Scoring to Generate Core collections

View source: R/pcss.core.R

pcss.coreR Documentation

Principal Component Scoring to Generate Core collections

Description

Generate a Core Collection with Principal Component Scoring Strategy (PCSS) \insertCitehamon_proposed_1990,noirot_principal_1996,noirot_method_2003rpcss using qualitative and/or quantitative trait data. \loadmathjax

Usage

pcss.core(
  data,
  names,
  quantitative,
  qualitative,
  eigen.threshold = NULL,
  size = 0.2,
  var.threshold = 0.75
)

Arguments

data

The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.

names

Name of column with the individual/genotype names as a character string.

quantitative

Name of columns with the quantitative traits as a character vector.

qualitative

Name of columns with the qualitative traits as a character vector.

eigen.threshold

The lower limit of the eigen value of factors to be included in the estimation. The default value is the average of all the eigen values.

size

The desired core set size proportion.

var.threshold

The desired proportion of total variability to be

Details

A core collection is constituted from an entire collection of \mjseqnN genotypes using quantitative data of \mjseqnJ traits using Principal Component Scoring Strategy (PCSS) \insertCitehamon_proposed_1990,noirot_principal_1996,noirot_method_2003rpcss as follows:

  1. Principal Component Analysis (PCA) is performed on the standardized genotype \mjseqn\times trait data. This takes care of multicollinearity between the traits to generate \mjseqnJ standardized and independent variables or factors or principal component.

  2. Considering only a subset of factors \mjseqnK, the Generalized Sum of Squares (GSS) of N individuals in K factorial spaces is computed as \mjseqnN \times K.

    \mjseqn

    K can be the number of factors for which the eigen value \mjseqn\lambda is greater than a threshold value such as 1 (Kaiser-Guttman criterion) or the average of all the eigen values.

  3. The contribution of the \mjseqnith genotype to GSS (\mjseqnP_i) or total variability is calculated as below.

    \mjsdeqn

    P_i = \sum_j = 1^K x_ij^2

    Where \mjseqnx_ij is the component score or coordinate of the \mjseqnith genotype on the \mjseqnjth principal component.

  4. For each genotype, its relative contribution to GSS or total variability is computed as below.

    \mjsdeqn

    CR_i = \fracP_iN \times K

  5. The genotypes are sorted in descending order of magnitude of their contribution to GSS and then the cumulative contribution of successive genotypes to GSS is computed.

  6. The core collection can then be selected by three different methods.

    1. Selection of fixed proportion or percentage or number of the top accessions.

    2. Selection of the top accessions that contribute up to a fixed percentage of the GSS.

    3. Fitting a logistic regression model of the following form to the cumulative contribution of successive genotypes to GSS \insertCitebalakrishnan_method_2000rpcss.

      \mjsdeqn\frac

      yA-y = e^a + bn

      The above equation can be reparameterized as below.

      \mjsdeqn\log

      _e \left ( \fracyA-y \right ) = a + bn

      Where, \mjseqna and \mjseqnb are the intercept and regression coefficient, respectively; \mjseqny is the cumulative contribution of successive genotypes to GSS; \mjseqnn is the rank of the genotype when sorted according to the contribution to GSS and \mjseqnA is the asymptote of the curve (\mjseqnA = 100).

      The rate of increase in the successive contribution of genotypes to GSS can be computed by the following equation to find the point of inflection where the rate of increase starts declining.

      \mjseqn\frac\mathrm

      d y\mathrmd x = by(A-y)

      The number of accessions included till the peak or infection point are selected to constitute the core collection.

Similarly for qualitative traits, standardized and independent variables or factors can be obtained by Correspondence Analysis (CA) on complete disjunctive table of genotype \mjseqn\times trait data or to be specific Multiple Correspondence Analysis (MCA). In rpcss, this has also been extended for data sets having both quantitative and qualitative traits by implementing Factor Analysis for Mixed Data (FAMD) for obtaining standardized and independent variables or factors.

In rpcss, PCA, MCA and FAMD are implemented via the FactoMineR package. \insertCitele_FactoMineR_2008,husson_Exploratory_2017rpcss.

Value

A list of class pcss.core with the following components.

details

The details of the core set generation process.

raw.out

The original output of PCA, CA and FAMD functions of FactoMineR

eigen

A data frame with eigen values and their partial and cumulative contribution to percentage of variance.

eigen.threshold

The threshold eigen value used.

rotation

A matrix of rotation values or loadings.

scores

A matrix of scores from PCA, CA or FAMD.

variability.ret

A data frame of individuals/genotypes ordered by variability retained.

cores.info

A data frame of core set size and percentage variability retained according to the method used.

References

\insertAllCited

See Also

PCA, CA and FAMD

Examples


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Prepare example data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

suppressPackageStartupMessages(library(EvaluateCore))

# Get data from EvaluateCore

data("cassava_EC", package = "EvaluateCore")
data = cbind(Genotypes = rownames(cassava_EC), cassava_EC)
quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")
rownames(data) <- NULL

# Convert qualitative data columns to factor
data[, qual] <- lapply(data[, qual], as.factor)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out1 <- pcss.core(data = data, names = "Genotypes",
                  quantitative = quant,
                  qualitative = NULL, eigen.threshold = NULL, size = 0.2,
                  var.threshold = 0.75)

out1

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out2 <- pcss.core(data = data, names = "Genotypes", quantitative = NULL,
                  qualitative = qual, eigen.threshold = NULL,
                  size = 0.2, var.threshold = 0.75)

out2

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Get core sets with PCSS (quantitative and qualitative data)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

out3 <- pcss.core(data = data, names = "Genotypes",
                  quantitative = quant,
                  qualitative = qual, eigen.threshold = NULL)

out3



rpcss documentation built on April 3, 2025, 10:57 p.m.