snpgdsPCACorr: PC-correlated SNPs in principal component analysis

View source: R/PCA.R

snpgdsPCACorrR Documentation

PC-correlated SNPs in principal component analysis

Description

To calculate the SNP correlations between eigenvactors and SNP genotypes

Usage

snpgdsPCACorr(pcaobj, gdsobj, snp.id=NULL, eig.which=NULL, num.thread=1L,
    with.id=TRUE, outgds=NULL, verbose=TRUE)

Arguments

pcaobj

a snpgdsPCAClass object returned from the function snpgdsPCA, a snpgdsEigMixClass from snpgdsEIGMIX, or an eigenvector matrix with row names (sample id)

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

snp.id

a vector of snp id specifying selected SNPs; if NULL, all SNPs are used

eig.which

a vector of integers, to specify which eigenvectors to be used

num.thread

the number of (CPU) cores used; if NA, detect the number of cores automatically

with.id

if TRUE, the returned value with sample.id and sample.id

outgds

NULL or a character of file name for exporting correlations to a GDS file, see details

verbose

if TRUE, show information

Details

If an output file name is specified via outgds, "sample.id", "snp.id" and "correlation" will be stored in the GDS file. The GDS node "correlation" is a matrix of correlation coefficients, and it is stored with the format of packed real number ("packedreal16" preserving 4 digits, 0.0001 is the smallest number greater zero, see add.gdsn).

Value

Return a list if outgds=NULL,

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

snpcorr

a matrix of correlation coefficients, "# of eigenvectors" x "# of SNPs"

Author(s)

Xiuwen Zheng

References

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

See Also

snpgdsPCA, snpgdsPCASampLoading, snpgdsPCASNPLoading

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())
# get chromosome index
chr <- read.gdsn(index.gdsn(genofile, "snp.chromosome"))

pca <- snpgdsPCA(genofile)
cr <- snpgdsPCACorr(pca, genofile, eig.which=1:4)
plot(abs(cr$snpcorr[3,]), xlab="SNP Index", ylab="PC 3", col=chr)


# output to a gds file if limited memory
snpgdsPCACorr(pca, genofile, eig.which=1:4, outgds="test.gds")

(f <- openfn.gds("test.gds"))
m <- read.gdsn(index.gdsn(f, "correlation"))
closefn.gds(f)

# check
summary(c(m - cr$snpcorr))  # should < 1e-4


# close the file
snpgdsClose(genofile)

# delete the temporary file
unlink("test.gds", force=TRUE)

zhengxwen/SNPRelate documentation built on April 16, 2024, 8:42 a.m.