xxt: X.X-transpose for a standardized SnpMatrix
In NikNakk/snpStats: SnpMatrix and XSnpMatrix classes and methods

Description Usage Arguments Details Value Warning Note Author(s) References Examples

The input SnpMatrix is first standardized by subtracting the mean (or stratum mean) from each call and dividing by the expected standard deviation under Hardy-Weinberg equilibrium. It is then post-multiplied by its transpose. This is a preliminary step in the computation of principal components.

1 2	xxt(snps, strata = NULL, correct.for.missing = FALSE, lower.only = FALSE, uncertain = FALSE)

`snps`	The input matrix, of type `"SnpMatrix"`
`strata`	A `factor` (or an object which can be coerced into a `factor`) with length equal to the number of rows of `snps` defining stratum membership
`correct.for.missing`	If `TRUE`, an attempt is made to correct for the effect of missing data by use of inverse probability weights. Otherwise, missing observations are scored zero in the standardized matrix
`lower.only`	If `TRUE`, only the lower triangle of the result is returned and the upper triangle is filled with zeros. Otherwise, the complete symmetric matrix is returned
`uncertain`	If `TRUE`, uncertain genotypes are replaced by posterior expectations. Otherwise these are treated as missing values

This computation forms the first step of the calculation of principal components for genome-wide SNP data. As pointed out by Price et al. (2006), when the data matrix has more rows than columns it is most efficient to calculate the eigenvectors of X.X-transpose, where X is a SnpMatrix whose columns have been standardized to zero mean and unit variance. For autosomes, the genotypes are given codes 0, 1 or 2 after subtraction of the mean, 2p, are divided by the standard deviation sqrt(2p(1-p)) (p is the estimated allele frequency). For SNPs on the X chromosome in male subjects, genotypes are coded 0 or 2. Then the mean is still 2p, but the standard deviation is 2sqrt(p(1-p)). If the strata is supplied, a stratum-specific estimate value for p is used for standardization.

Missing observations present some difficulty. Price et al. (2006) recommended replacing missing observations by their means, this being equivalent to replacement by zeros in the standardized matrix. However this results in a biased estimate of the complete data result. Optionally this bias can be corrected by inverse probability weighting. We assume that the probability that any one call is missing is small, and can be predicted by a multiplicative model with row (subject) and column (locus) effects. The estimated probability of a missing value in a given row and column is then given by m = RC/T, where R is the row total number of no-calls, C is the column total of no-calls, and T is the overall total number of no-calls. Non-missing contributions to X.X-transpose are then weighted by w=1/(1-m) for contributions to the diagonal elements, and products of the relevant pairs of weights for contributions to off–diagonal elements.

A square matrix containing either the complete X.X-transpose matrix, or just its lower triangle

The correction for missing observations can result in an output matrix which is not positive semi-definite. This should not matter in the application for which it is intended

In genome-wide studies, the SNP data will usually be held as a series of objects (of class "SnpMatrix" or"XSnpMatrix"), one per chromosome. Note that the X.X-transpose matrices produced by applying the xxt function to each object in turn can be added to yield the genome-wide result.

If the matrix is converted to a correlation matrix by pre- and post-multiplying by the sqrt of the inverse of its diagonal, then this is an unbiased estimate of twice the kinship matrix.

David Clayton dc208@cam.ac.uk

Price et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38:904-9

# make a SnpMatrix with a small number of rows
data(testdata)
small <- Autosomes[1:100,]
# Calculate the X.X-transpose matrix
xx <- xxt(small, correct.for.missing=TRUE)
# Calculate the principal components
pc <- eigen(xx, symmetric=TRUE)$vectors

NikNakk/snpStats documentation built on May 7, 2019, 6:18 p.m.

NikNakk/snpStats index

Data input Fst Imputation and meta-analysis LD statistics Principal components analysis snpMatrix-differences snpStats introduction TDT tests

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NikNakk/snpStats
SnpMatrix and XSnpMatrix classes and methods

xxt: X.X-transpose for a standardized SnpMatrix
In NikNakk/snpStats: SnpMatrix and XSnpMatrix classes and methods

Description

Usage

Arguments

Details

Value

Warning

Note

Author(s)

References

Examples

Related to xxt in NikNakk/snpStats...

R Package Documentation

Browse R Packages

We want your feedback!

NikNakk/snpStats SnpMatrix and XSnpMatrix classes and methods

xxt: X.X-transpose for a standardized SnpMatrix In NikNakk/snpStats: SnpMatrix and XSnpMatrix classes and methods

Description

Usage

Arguments

Details

Value

Warning

Note

Author(s)

References

Examples

Related to xxt in NikNakk/snpStats...

R Package Documentation

Browse R Packages

We want your feedback!

NikNakk/snpStats
SnpMatrix and XSnpMatrix classes and methods

xxt: X.X-transpose for a standardized SnpMatrix
In NikNakk/snpStats: SnpMatrix and XSnpMatrix classes and methods