Home

/

CRAN

/

pca.bed: Principal Component Analysis (PCA) on whole genome data with

pca.bed: Principal Component Analysis (PCA) on whole genome data with
In VariantScan: A Machine Learning Tool for Genetic Association Studies

pca	R Documentation

Principal Component Analysis (PCA) on whole genome data with

Description

Fast implementation of Principal Component Analysis (PCA) on whole genome data

Usage

pca(genfile, sample.id = NULL, snp.id = NULL, autosome.only = TRUE, 
remove.monosnp = TRUE, maf = NaN, missing.rate = NaN, 
algorithm = c("exact", "randomized"), 
eigen.cnt = ifelse(identical(algorithm, "randomized"), 16L, 32L),
num.thread = 1L, bayesian = FALSE, need.genmat = FALSE, 
genmat.only = FALSE, eigen.method = c("DSPEVX", "DSPEV"), 
aux.dim = eigen.cnt * 2L, iter.num = 10L, verbose = TRUE,...)

## S3 method for class 'pca.bed'
pca.bed(genfile, sample.id = NULL, snp.id = NULL, autosome.only = TRUE, 
remove.monosnp = TRUE, maf = NaN, missing.rate = NaN, 
algorithm = c("exact", "randomized"), 
eigen.cnt = ifelse(identical(algorithm, "randomized"), 16L, 32L),
num.thread = 1L, bayesian = FALSE, need.genmat = FALSE, 
genmat.only = FALSE, eigen.method = c("DSPEVX", "DSPEV"), 
aux.dim = eigen.cnt * 2L, iter.num = 10L, verbose = TRUE,...)

## S3 method for class 'pca.vcf'
pca.vcf(genfile, sample.id = NULL, snp.id = NULL, autosome.only = TRUE, 
remove.monosnp = TRUE, maf = NaN, missing.rate = NaN, 
algorithm = c("exact", "randomized"), 
eigen.cnt = ifelse(identical(algorithm, "randomized"), 16L, 32L),
num.thread = 1L, bayesian = FALSE, need.genmat = FALSE, 
genmat.only = FALSE, eigen.method = c("DSPEVX", "DSPEV"), 
aux.dim = eigen.cnt * 2L, iter.num = 10L, verbose = TRUE,...)

## S3 method for class 'pca.gds'
pca.gds(genfile, sample.id = NULL, snp.id = NULL, autosome.only = TRUE, 
remove.monosnp = TRUE, maf = NaN, missing.rate = NaN, 
algorithm = c("exact", "randomized"), 
eigen.cnt = ifelse(identical(algorithm, "randomized"), 16L, 32L),
num.thread = 1L, bayesian = FALSE, need.genmat = FALSE, 
genmat.only = FALSE, eigen.method = c("DSPEVX", "DSPEV"), 
aux.dim = eigen.cnt * 2L, iter.num = 10L, verbose = TRUE,...)

Arguments

`genfile`	Genetic datasets containg sample ID and SNP ID, format includes bed (plink), vcf, or GDS file.
`sample.id`	a vector of sample id specifying selected samples; if NULL, all samples are used
`snp.id`	a vector of snp id specifying selected SNPs; if NULL, all SNPs are used
`autosome.only`	use autosomal SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome.
`remove.monosnp`	remove monomorphic SNPs
`maf`	filter SNPs with ">= maf" only; if NaN, no MAF threshold
`missing.rate`	filter the SNPs with "<= missing.rate" only; if NaN, no missing threshold
`algorithm`	"exact", traditional exact calculation; "randomized", fast PCA with randomized algorithm introduced in Galinsky et al. 2016
`eigen.cnt`	output the number of eigenvectors; if eigen.cnt <= 0, then return all eigenvectors
`num.thread`	the number of (CPU) cores used; if NA, detect the number of cores automatically
`bayesian`	if TRUE, use bayesian normalization
`need.genmat`	if TRUE, return the genetic covariance matrix
`genmat.only`	return the genetic covariance matrix only, do not compute the eigenvalues and eigenvectors
`eigen.method`	"DSPEVX" -compute the top eigen.cnt eigenvalues and eigenvectors using LAPACK::DSPEVX; "DSPEV" -to be compatible with SNPRelate_1.1.6 or earlier, using LAPACK::DSPEV; "DSPEVX" is significantly faster than "DSPEV" if only top principal components are of interest
`aux.dim`	auxiliary dimension used in fast randomized algorithm
`iter.num`	iteration number used in fast randomized algorithm
`verbose`	if TRUE, show information
`...`	more arguments

Details

Efficient and fast implementation of PCA leveraging the advantage of Genomic Data Structure (GDS) to accelerate computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures. The minor allele frequency and missing rate for each SNP passed in snp.id are calculated over all the samples in sample.id.

Value

Return a of of PCA results, including sample id, SNP id and PCs.

`eigenval`	eigenvalues
`eigenvect`	eigenvactors, "# of samples" x "eigen.cnt"
`varprop`	variance proportion for each principal component

References

Zheng, X., Weir, B. S. (2016). Eigenanalysis of SNP data with an identity by descent interpretation. Theoretical population biology, 107, 65-76.

Patterson N, Price AL, Reich D. (2006). Population structure and eigenanalysis. PLoS Genet.2(12):e190.

Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, Price AL. (2016). Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet. 2016 Mar 3;98(3):456-72.

Examples



inp=SNPRelate::snpgdsExampleFileName()

pca1=pca.gds(inp, autosome.only=TRUE, remove.monosnp=TRUE, maf=0.01, missing.rate=0.1)

VariantScan documentation built on June 30, 2022, 5:05 p.m.

VariantScan index

README.md VariantScan: a machine leaning tool for genetic variant association testing "

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

VariantScan
A Machine Learning Tool for Genetic Association Studies

pca.bed: Principal Component Analysis (PCA) on whole genome data with
In VariantScan: A Machine Learning Tool for Genetic Association Studies

Principal Component Analysis (PCA) on whole genome data with

Description

Usage

Arguments

Details

Value

References

Examples

Related to pca.bed in VariantScan...

R Package Documentation

Browse R Packages

We want your feedback!

VariantScan A Machine Learning Tool for Genetic Association Studies

pca.bed: Principal Component Analysis (PCA) on whole genome data with In VariantScan: A Machine Learning Tool for Genetic Association Studies

Principal Component Analysis (PCA) on whole genome data with

Description

Usage

Arguments

Details

Value

References

Examples

Related to pca.bed in VariantScan...

R Package Documentation

Browse R Packages

We want your feedback!

VariantScan
A Machine Learning Tool for Genetic Association Studies

pca.bed: Principal Component Analysis (PCA) on whole genome data with
In VariantScan: A Machine Learning Tool for Genetic Association Studies