seqPCA: Run PCA-seq on a GDS file

Description Usage Arguments Details Value Note

Description

This function calculates the Genetic Relatedness Matrix (GRM) on a GDS file using the PCA-seq method for sequence data.

Usage

1
2
3
4
seqPCA(gdsobj, weights = c(1, 1), sample.id = NULL, snp.id = NULL,
  autosome.only = TRUE, remove.monosnp = TRUE, maf = NaN,
  missing.rate = NaN, eigen.cnt = 32, need.genmat = FALSE,
  verbose = TRUE)

Arguments

gdsobj

an object of the class SNPGDSFileClass, a SNP GDS file.

weights

a vector of two numbers, indicating the paraters to use for the beta function weights; see Details.

sample.id

a vector of sample ids specifying the samples to use for analysis; if NULL, all samples are used.

snp.id

a vector of SNP ids specifying the SNPs to use for analysis; if NULL, all SNPs are used.

autosome.only

if TRUE, use autosomal SNPs only; if it is a numeric or character vector, keep SNPs according to the specified chromosomes.

remove.monosnp

if TURE, remove monomorphic SNPs.

maf

if one number is specified, use SNPs with MAF greater than or equal to this value; if a numeric vector of length two is specified, only SNPs with MAFs in (min, max) are taken.

missing.rate

to use the SNPs with missing rates less than or equal to missing.rate; if NaN, no misisng threshold.

eigen.cnt

the number of eigen vectors and values to return; if zero, return all eigenvalues and vectors.

need.genmat

if TRUE, return the genetic relatedness matrix.

verbose

Not supported.

Details

If method is "eigen", the GRM is calculated using the EIGENSTRAT method as given in Patterson et al 2006. If method is "pcaseq", the GRM is calculated using the PCA-seq method.

Value

Return a snpPCAClass object, a list with the follow slots:

weights

the parameters used to define the weights used to calculate the GRM

maf

the MAF cutoffs used

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

eigenval

eigenvalues

eigenvect

a matrix of eigenvectors of dimensions # of samples by eigen.cnt

varprop

the proportion of the variance explained by each principal component

TraceXTX

the trace of the genetic relateness matrix

Bayesian

indicates Bayes normalization; set to FALSE, as this is not currently supported

genmat

the genetic relateness matrix

Note

If you need to run the EIGENSTRAT method on a very large data set and do not need to subset by both a minimum and maximum MAF, the snpgdsPCA function will be faster.


jellily/PCAseq documentation built on May 19, 2019, 4:02 a.m.