sampleQC | R Documentation |
A wrap-up function for sample QC. It reads in the variant genotypes in vcf/PLINK format, merges study cohort with benchmark data, and performs sample QC for the merged dataset.
sampleQC(
vfile = NULL,
output = "sampleqc",
capture.region = NULL,
sample.annot = NULL,
LDprune = TRUE,
vfile.restrict = FALSE,
slide.max.bp = 5e+05,
ld.threshold = 0.3,
format.data = "NGS",
format.file = "vcf",
QCreport = TRUE,
out.report = "report.html",
interactive = TRUE,
results = TRUE,
plotting = TRUE,
...
)
vfile |
vcf or PLINK input file (ped/map/bed/bim/fam with same
basename). The default is NULL. Vfile could be a vector of
character strings, see details. Could also take file in
|
output |
a character string for name of merged data of SeqSQC
object. The |
capture.region |
the BED file of sequencing capture regions. The default is NULL. For exome-sequencing data, the capture region file must be provided. |
sample.annot |
sample annotation file with 3 columns (with header) in the order of sample id, sample population and sex info. The default is NULL. |
LDprune |
whether to use LD-pruned snp set. The default is TRUE. |
vfile.restrict |
whether the input vcf or plink file has already been restricted by capture region. The default is FALSE. |
slide.max.bp |
the window size of SNPs when calculating linkage disequilibrium. The default is 5e+05. |
ld.threshold |
the r^2 threshold for LD-based SNP pruning if
|
format.data |
the data source. The default is |
format.file |
the data format. The default is |
QCreport |
Whether to generate the sample QC report in html format. |
out.report |
the file name for the sample QC report. The
default is |
interactive |
whether to generate interactive plots in the
sample QC report if |
results |
whether to write out the results for each QC steps in .txt files. The default is TRUE. |
plotting |
whether to output the plots for each QC steps in .pdf files. the default is TRUE. |
... |
Arguments to be passed to other methods. |
For vfile
with more than one file names,
sampleQC
will merge all dataset together if they all
contain the same samples. It is useful to combine
genetic/genomic data together if VCF data is divided by
chromosomes.
There are 3 columns in sample.annot
file. col 1 is sample
with sample ids, col 2 is
population
with values of "AFR/EUR/ASN/EAS/SAS", col 3
is gender
with values of "male/female".
a SeqSQC object with the filepath to the gds file which stores the genotype, the summary of samples and variants, and the QCresults including the sample annotation information and all QC results.
Qian Liu qliu7@buffalo.edu
## Not run:
infile <- system.file("extdata", "example_sub.vcf", package="SeqSQC")
sample.annot <- system.file("extdata", "sampleAnnotation.txt", package="SeqSQC")
cr <- system.file("extdata", "CCDS.Hs37.3.reduced_chr1.bed", package="SeqSQC")
outfile <- file.path(tempdir(), "testWrapUp")
seqfile <- sampleQC(vfile = infile, output = outfile, capture.region = cr,
sample.annot = sample.annot, format.data = "NGS", format.file = "vcf",
QCreport = TRUE, out.report="report.html", interactive = TRUE)
## save(seqfile, file="seqfile.RData")
load(system.file("extdata", "example.seqfile.Rdata", package="SeqSQC"))
gfile <- system.file("extdata", "example.gds", package="SeqSQC")
seqfile <- SeqSQC(gdsfile = gfile, QCresult = QCresult(seqfile))
seqfile <- sampleQC(sfile = seqfile, output = outfile, QCreport = FALSE,
out.report="report.html", interactive = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.