| PCA_biplot | R Documentation |
PCA_biplot() creates the PCA (Principal Component
Analysis) biplot with loadings for the new index rYWAASB
for simultaneous selection of genotypes by trait and WAASB index.
It shows rYWAASB, rWAASB and rWAASBY indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
PCA_biplot(datap, lowt = FALSE)
datap |
The data set |
lowt |
A parameter indicating whether lower rates of the trait is preferred or not. For grain yield e.g. Upper values is preferred. For plant height lower values e.g. is preferred. |
PCA is a machine learning method and dimension
reduction technique.
It is utilized to simplify large data sets by extracting
a smaller set that preserves significant patterns and
trends(1).
According to Johnson and Wichern (2007), a PCA explains
the var-covar structure of a set of variables
\loadmathjax
\mjseqnX_1, X_2, ..., X_p with a less linear
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.
Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2). For scaling variables, the data can be transformed as follow: \mjsdeqnz = \fracx-\barxs(x) where 's(x)' denotes the sample standard deviation of 'x' parameter, calculated as: \mjsdeqns = \sqrt\frac1n-1\sum_i=1^n(x_i-\barx)^2 Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector \mjseqn\mathbfX' = {X_1, X_2,...,X_p } have the covariance matrix \mjseqn\sum with eigenvalues \mjseqn\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0.
Regarding the linear combinations: \mjsdeqnY_1 = a'_1X = a_11X_1 + a_12X_2 + ... + a_1PX_p \mjsdeqnY_2 = a'_2X = a_21X_1 + a_22X_2 + ... + a_2pX_p \mjsdeqn... \mjsdeqnY_p = a'_pX = a_p1X_1 + a_p2X_2 + ... + a_ppX_p
where \mjseqnVar(Y_i) = \mathbfa'_i\suma_i , i = 1, 2, ..., p \mjseqnCov(Y_i, Y_k) = \mathbfa'_i\suma_k , i, k = 1, 2, ..., p
The principal components refer to the uncorrelated linear combinations \mjseqnY_1, Y_2, ..., Y_p which aim to have the largest possible variances.
For the random vector \mjseqn\mathbfX'=\left [ X_1, X_2, ..., X_p \right ], if \mjseqn\mathbf\sum be the associated covariance matrix, then \mjseqn\mathbf\sum have the eigenvalue-eigenvector pairs \mjseqn(\lambda_1, e_1), (\lambda_2, e_2), ..., (\lambda_p, e_p), and as said \mjseqn\lambda_1 \ge \lambda_2 \ge ... \ge \lambda_p \ge 0.
Then the \mjseqn\itith principal component is as follows: \mjsdeqnY_i = \mathbfe'_iX = e_i1X_1 + e_i2X_2 + ... + e_ipX_p, i = 1, 2, ..., p, where \mjseqnVar(Y_i) =\mathbf(e'_i\sume_i) = \lambda_i, i = 1, 2, ..., p \mjseqnCov(Y_i, Y_k) = \mathbfe'_i\sum e_i = 0, i \not\equiv k, and: \mjseqn\sigma_11 + \sigma_22 + ... + \sigma_pp = \sum_i=1^pVar(X_i) = \lambda_1 + \lambda_2 + ... + \lambda_p = \sum_i=1^pVar(Y_i).
Interestingly, Total population variance = \mjseqn\sigma_11 + \sigma_22 + ... + \sigma_pp = \lambda_1 + \lambda_2 + ... + \lambda_p.
Another issues that are significant in PCA analysis are:
The proportion of total variance due to (explained by) the \mjseqn\mathitkth principal component: \mjsdeqn\frac\lambda_k(\lambda_1 + \lambda_2 + ... + \lambda_p), k=1, 2, ..., p
The correlation coefficients between the components \mjseqnY_i and the variables \mjseqnX_k is as follows: \mjseqn\rho_Y_i, X_k = \frace_ik\sqrt\lambda_i\sqrt\sigma_kk, i,k = 1, 2, ..., p
Please note that PCA can be performed on Covariance or
ā correlation matricesā .
And before PCA the data should be centered, generally.
Returns a a list of dataframes
Ali Arminian abeyran@gmail.com
(2) https://pca4ds.github.io/biplot-and-pca.html.
(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.
# Case 1: for maize dataset, grain yield
data(maize)
PCA_biplot(maize) # or: PCA_biplot(maize, lowt = FALSE)
# Case 2: for days to maturity (dm) trait of chickpea
data(dm)
PCA_biplot(dm, lowt = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.