PCACheck | R Documentation |
Function to perform principle component analysis for all samples and to infer sample ancestry.
PCACheck(
seqfile,
remove.samples = NULL,
npcs = 4,
LDprune = TRUE,
missing.rate = 0.1,
ss.cutoff = 300,
maf = 0.01,
hwe = 1e-06,
...
)
seqfile |
SeqSQC object, which includes the merged gds file for study cohort and benchmark. |
remove.samples |
a vector of sample names for removal from PCA calculation. Could be problematic samples identified from previous QC steps, or user-defined samples. |
npcs |
the number principle components to use for the population prediction in SVM model. The default value is 4, and it is required to be <= 10. |
LDprune |
whether to use LD-pruned snp set, the default is TRUE. |
missing.rate |
to use the SNPs with "<= |
ss.cutoff |
the minimum sample size (300 by default) to apply the MAF filter. This sample size is the sum of study samples and the benchmark samples of the same population as the study cohort. |
maf |
to use the SNPs with ">= |
hwe |
to use the SNPs with Hardy-Weinberg equilibrium p >=
|
... |
Arguments to be passed to other methods. |
Using LD-pruned autosomal variants (by default), we
calculate the eigenvectors and eigenvalues for principle
component analysis (PCA). We use the benchmark samples as
training dataset, and predict the population group for each
sample in the study cohort based on the top four
eigenvectors. Samples with discordant predicted and
self-reported population groups are considered problematic. The
function PCACheck
performs the PCA analysis and
identifies population outliers in study cohort.
a data frame with sample name, reported population, data resource (benchmark vs study cohort), the first four eigenvectors and the predicted population.
Qian Liu qliu7@buffalo.edu
load(system.file("extdata", "example.seqfile.Rdata", package="SeqSQC"))
gfile <- system.file("extdata", "example.gds", package="SeqSQC")
seqfile <- SeqSQC(gdsfile = gfile, QCresult = QCresult(seqfile))
seqfile <- PCACheck(seqfile, remove.samples=NULL, LDprune=TRUE, missing.rate=0.1)
res.pca <- QCresult(seqfile)$PCA
tail(res.pca)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.