snpgdsPCASampLoading: Project individuals onto existing principal component axes
In SNPRelate: Parallel Computing Toolset for Genome-Wide Association Studies (GWAS)

Description Usage Arguments Details Value Author(s) References See Also Examples

To calculate the sample eigenvectors using the specified SNP loadings

1	snpgdsPCASampLoading(loadobj, gdsobj, sample.id=NULL, num.thread=1, verbose=TRUE)

`loadobj`	the `snpgdsPCASNPLoadingClass` object, returned from snpgdsPCASNPLoading
`gdsobj`	a GDS file object (`gds.class`)
`sample.id`	a vector of sample id specifying selected samples; if NULL, all samples are used
`num.thread`	the number of CPU cores used
`verbose`	if TRUE, show information

the sample.id are usually different from the samples used in the calculation of SNP loadings.

Return a snpgdsPCAClass object, and it is a list:

`sample.id`	the sample ids used in the analysis
`snp.id`	the SNP ids used in the analysis
`eigenval`	eigenvalues
`eigenvect`	eigenvactors, “# of samples” x “eigen.cnt”
`TraceXTX`	the trace of the genetic covariance matrix
`Bayesian`	whether use bayerisan normalization

Xiuwen Zheng

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 38, 904-909.

Zhu, X., Li, S., Cooper, R. S., and Elston, R. C. (2008). A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet, 82(2), 352-365.

snpgdsPCA, snpgdsPCACorr, snpgdsPCASNPLoading

# open an example dataset (HapMap)
genofile <- openfn.gds(snpgdsExampleFileName())

sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

PCARV <- snpgdsPCA(genofile, eigen.cnt=8)
SnpLoad <- snpgdsPCASNPLoading(PCARV, genofile)

# calculate sample eigenvectors from SNP loadings
SL <- snpgdsPCASampLoading(SnpLoad, genofile, sample.id=sample.id[1:100])

diff <- PCARV$eigenvect[1:100,] - SL$eigenvect
summary(c(diff))
# ~ ZERO

# close the genotype file
closefn.gds(genofile)

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Hint: it is suggested to call `snpgdsOpen' to open a SNP GDS file instead of `openfn.gds'.
Principal Component Analysis (PCA) on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
Working space: 279 samples, 8,722 SNPs
    using 1 (CPU) core
PCA:    the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Thu Dec  7 09:10:20 2017    (internal increment: 408)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Thu Dec  7 09:10:20 2017    Begin (eigenvalues and eigenvectors)
Thu Dec  7 09:10:20 2017    Done.
Hint: it is suggested to call `snpgdsOpen' to open a SNP GDS file instead of `openfn.gds'.
SNP loading:
Working space: 279 samples, 8722 SNPs
    using 1 (CPU) core
    using the top 8 eigenvectors
SNP Loading:    the sum of all selected genotypes (0,1,2) = 2446510
Thu Dec  7 09:10:20 2017    (internal increment: 3288)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Thu Dec  7 09:10:20 2017    Done.
Hint: it is suggested to call `snpgdsOpen' to open a SNP GDS file instead of `openfn.gds'.
Sample loading:
Working space: 100 samples, 8722 SNPs
    using 1 (CPU) core
    using the top 8 eigenvectors
Sample Loading:    the sum of all selected genotypes (0,1,2) = 878146
Thu Dec  7 09:10:21 2017    (internal increment: 9172)

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed in 0s
Thu Dec  7 09:10:21 2017    Done.
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-8.882e-16 -6.939e-17 -1.735e-18  2.873e-17  6.700e-17  3.553e-15