PCA_biplot | R Documentation |
PCA_biplot()
creates the PCA (Principal Component
Analysis) biplot with loadings for the new index rYWAASB
for simultaneous selection of genotypes by trait and WAASB index.
It shows rYWAASB
, rWAASB
and rWAASBY
indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
PCA_biplot(datap, lowt = FALSE)
datap |
The data set |
lowt |
A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred. |
PCA is a machine learning method and dimension
reduction technique.
It is utilized to simplify large data sets by extracting
a smaller set that preserves significant patterns and
trends(1).
According to Johnson and Wichern (2007), a PCA explains
the var-covar structure of a set of variables
\loadmathjax
\mjseqnX_1, X_2, ..., X_p with a less linear
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.
Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2). For scaling variables, the data can be transformed as follow: \mjsdeqnz = \fracx-\barxs(x) where 's(x)' denotes the sample standard deviation of 'x' parameter, calculated as: \mjsdeqns = \sqrt\frac1n-1\sum_i=1^n(x_i-\barx)^2 Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector \mjseqn\mathbfX' = {X_1, X_2,...,X_p } have the covariance matrix \mjseqn\sum with eigenvalues \mjseqn\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0.
Regarding the linear combinations: \mjsdeqnY_1 = a'_1X = a_11X_1 + a_12X_2 + ... + a_1PX_p \mjsdeqnY_2 = a'_2X = a_21X_1 + a_22X_2 + ... + a_2pX_p \mjsdeqn... \mjsdeqnY_p = a'_pX = a_p1X_1 + a_p2X_2 + ... + a_ppX_p
where \mjseqnVar(Y_i) = \mathbfa'_i\suma_i , i = 1, 2, ..., p \mjseqnCov(Y_i, Y_k) = \mathbfa'_i\suma_k , i, k = 1, 2, ..., p
The principal components refer to the uncorrelated linear combinations \mjseqnY_1, Y_2, ..., Y_p which aim to have the largest possible variances.
For the random vector \mjseqn\mathbfX'=\left [ X_1, X_2, ..., X_p \right ], if \mjseqn\mathbf\sum be the associated covariance matrix, then \mjseqn\mathbf\sum have the eigenvalue-eigenvector pairs \mjseqn(\lambda_1, e_1), (\lambda_2, e_2), ..., (\lambda_p, e_p), and as said \mjseqn\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0.
Then the \mjseqn\itith principal component is as follows: \mjsdeqnY_i = \mathbfe'_iX = e_i1X_1 + e_i2X_2 + ... + e_ipX_p, i = 1, 2, ..., p, where \mjseqnVar(Y_i) =\mathbf(e'_i\sume_i) = \lambda_i, i = 1, 2, ..., p \mjseqnCov(Y_i, Y_k) = \mathbfe'_i\sum e_i = 0, i \not\equiv k, and: \mjseqn\sigma_11 + \sigma_22 + ... + \sigma_pp = \sum_i=1^pVar(X_i) = \lambda_1 + \lambda_2 + ... + \lambda_p = \sum_i=1^pVar(Y_i).
Interestingly, Total population variance = \mjseqn\sigma_11 + \sigma_22 + ... + \sigma_pp = \lambda_1 + \lambda_2 + ... + \lambda_p.
Another issues that are significant in PCA analysis are:
The proportion of total variance due to (explained by) the \mjseqn\mathitkth principal component: \mjsdeqn\frac\lambda_k(\lambda_1 + \lambda_2 + ... + \lambda_p), k=1, 2, ..., p
The correlation coefficients between the components \mjseqnY_i and the variables \mjseqnX_k is as follows: \mjseqn\rho_Y_i, X_k = \frace_ik\sqrt\lambda_i\sqrt\sigma_kk, i,k = 1, 2, ..., p
Please note that PCA can be performed on Covariance
or
ā correlation matricesā
.
And before PCA the data should be centered, generally.
Returns a a list of dataframes
Ali Arminian abeyran@gmail.com
(2) https://pca4ds.github.io/biplot-and-pca.html.
(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.
# Case 1: for maize dataset, grain yield
data(maize)
PCA_biplot(maize) # or: PCA_biplot(maize, lowt = FALSE)
# Case 2: for days to maturity (dm) trait of chickpea
data(dm)
PCA_biplot(dm, lowt = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.