snpgdsPCACorr: PC-correlated SNPs in principal component analysis
In zhengxwen/SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

snpgdsPCACorr

R Documentation

PC-correlated SNPs in principal component analysis

Description

To calculate the SNP correlations between eigenvactors and SNP genotypes

Usage

snpgdsPCACorr(pcaobj, gdsobj, snp.id=NULL, eig.which=NULL, num.thread=1L,
    with.id=TRUE, outgds=NULL, verbose=TRUE)

Arguments

`pcaobj`	a `snpgdsPCAClass` object returned from the function snpgdsPCA, a `snpgdsEigMixClass` from snpgdsEIGMIX, or an eigenvector matrix with row names (sample id)
`gdsobj`	an object of class `SNPGDSFileClass`, a SNP GDS file
`snp.id`	a vector of snp id specifying selected SNPs; if NULL, all SNPs are used
`eig.which`	a vector of integers, to specify which eigenvectors to be used
`num.thread`	the number of (CPU) cores used; if `NA`, detect the number of cores automatically
`with.id`	if `TRUE`, the returned value with `sample.id` and `sample.id`
`outgds`	`NULL` or a character of file name for exporting correlations to a GDS file, see details
`verbose`	if TRUE, show information

Details

If an output file name is specified via outgds, "sample.id", "snp.id" and "correlation" will be stored in the GDS file. The GDS node "correlation" is a matrix of correlation coefficients, and it is stored with the format of packed real number ("packedreal16" preserving 4 digits, 0.0001 is the smallest number greater zero, see add.gdsn).

Value

Return a list if outgds=NULL,

`sample.id`	the sample ids used in the analysis
`snp.id`	the SNP ids used in the analysis
`snpcorr`	a matrix of correlation coefficients, "# of eigenvectors" x "# of SNPs"

Author(s)

Xiuwen Zheng

References

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())
# get chromosome index
chr <- read.gdsn(index.gdsn(genofile, "snp.chromosome"))

pca <- snpgdsPCA(genofile)
cr <- snpgdsPCACorr(pca, genofile, eig.which=1:4)
plot(abs(cr$snpcorr[3,]), xlab="SNP Index", ylab="PC 3", col=chr)


# output to a gds file if limited memory
snpgdsPCACorr(pca, genofile, eig.which=1:4, outgds="test.gds")

(f <- openfn.gds("test.gds"))
m <- read.gdsn(index.gdsn(f, "correlation"))
closefn.gds(f)

# check
summary(c(m - cr$snpcorr))  # should < 1e-4


# close the file
snpgdsClose(genofile)

# delete the temporary file
unlink("test.gds", force=TRUE)

zhengxwen/SNPRelate documentation built on Nov. 19, 2024, 1:02 p.m.