snpgdsPCACorr: PC-correlated SNPs in principal component analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/PCA.R

Description

To calculate the SNP correlations between eigenvactors and SNP genotypes

Usage

1
2
snpgdsPCACorr(pcaobj, gdsobj, snp.id=NULL, eig.which=NULL, num.thread=1L,
    with.id=TRUE, outgds=NULL, verbose=TRUE)

Arguments

pcaobj

a snpgdsPCAClass object returned from the function snpgdsPCA, a snpgdsEigMixClass from snpgdsEIGMIX, or an eigenvector matrix with row names (sample id)

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

eig.which

a vector of integers, to specify which eigenvectors to be used

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

with.id

if TRUE, the returned value with sample.id and sample.id

outgds

NULL or a character of file name for exporting correlations to a GDS file, see details

verbose

if TRUE, show information

Details

If an output file name is specified via outgds, "sample.id", "snp.id" and "correlation" will be stored in the GDS file. The GDS node "correlation" is a matrix of correlation coefficients, and it is stored with the format of packed real number ("packedreal16" preserving 4 digits, 0.0001 is the smallest number greater zero, see add.gdsn).

Value

Return a list if outgds=NULL,

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

snpcorr

a matrix of correlation coefficients, "# of eigenvectors" x "# of SNPs"

Author(s)

Xiuwen Zheng

References

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

See Also

snpgdsPCA, snpgdsPCASampLoading, snpgdsPCASNPLoading

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())
# get chromosome index
chr <- read.gdsn(index.gdsn(genofile, "snp.chromosome"))

pca <- snpgdsPCA(genofile)
cr <- snpgdsPCACorr(pca, genofile, eig.which=1:4)
plot(abs(cr$snpcorr[3,]), xlab="SNP Index", ylab="PC 3", col=chr)


# output to a gds file if limited memory
snpgdsPCACorr(pca, genofile, eig.which=1:4, outgds="test.gds")

(f <- openfn.gds("test.gds"))
m <- read.gdsn(index.gdsn(f, "correlation"))
closefn.gds(f)

# check
summary(c(m - cr$snpcorr))  # should < 1e-4


# close the file
snpgdsClose(genofile)

# delete the temporary file
unlink("test.gds", force=TRUE)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Principal Component Analysis (PCA) on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
    # of principal components: 32
PCA:    the sum of all selected genotypes (0,1,2) = 2446510
CPU capabilities: Double-Precision SSE2
Wed Feb 17 17:09:36 2021    (internal increment: 408)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Wed Feb 17 17:09:36 2021    Begin (eigenvalues and eigenvectors)
Wed Feb 17 17:09:36 2021    Done.
SNP Correlation:
    # of samples: 279
    # of SNPs: 9,088
    using 1 thread
Correlation:    the sum of all selected genotypes (0,1,2) = 2553065
Wed Feb 17 17:09:36 2021    (internal increment: 3288)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Wed Feb 17 17:09:36 2021    Done.
SNP Correlation:
    # of samples: 279
    # of SNPs: 9,088
    using 1 thread
Creating 'test.gds' ...
Correlation:    the sum of all selected genotypes (0,1,2) = 2553065
Wed Feb 17 17:09:36 2021

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Wed Feb 17 17:09:36 2021    Done.
File: /work/tmp/test.gds (66.4K)
+    [  ]
|--+ sample.id   { Str8 279 LZMA_ra(30.5%), 701B }
|--+ snp.id   { Int32 9088 LZMA_ra(10.1%), 3.6K }
\--+ correlation   { PackedReal16 4x9088 LZMA_ra(86.4%), 61.3K }
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -5e-05  -2e-05   0e+00   0e+00   3e-05   5e-05      32 

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.