LoadVfile: Data preprocessing for VCF or plink input from NGS or GWAS...

Description Usage Arguments Details Value Author(s) Examples

View source: R/LoadVfile.R

Description

Function to read VCF or plink files, merge with benchmark data, and output as SeqSQC object.

Usage

1
2
3
4
LoadVfile(vfile, output = "sampleqc", capture.region = NULL,
  sample.annot = NULL, LDprune = TRUE, vfile.restrict = FALSE,
  slide.max.bp = 5e+05, ld.threshold = 0.3, format.data = "NGS",
  format.file = "vcf", ...)

Arguments

vfile

vcf or PLINK input file (ped/map/bed/bim/fam with same basename). Vfile could be a vector of character strings, see details.

output

a character string for name of merged data of SeqSQC object. The dirname(output) would be used as the directory to save the QC results and plots. The default is sampleqc in working directory.

capture.region

the BED file of sequencing capture regions. The default is NULL. For exome-sequencing data, the capture region file must be provided.

sample.annot

sample annotation file with 3 columns (with header) in the order of sample id, sample population and sex info. The default is NULL.

LDprune

whether to use LD-pruned snp set. The default is TRUE.

vfile.restrict

whether the input vcf or plink file has already been restricted by capture region. The default is FALSE.

slide.max.bp

the window size of SNPs when calculating linkage disequilibrium. The default is 5e+05.

ld.threshold

the r^2 threshold for LD-based SNP pruning if LDprune = TRUE. The default is 0.3.

format.data

the data source. The default is NGS for sequencing data.

format.file

the data format. The default is vcf.

...

Arguments to be passed to other methods.

Details

For vfile with more than one file names, LoadVfile will merge all dataset together if they all contain the same samples. It is useful to combine genetic/genomic data together if VCF data is divided by chromosomes.
sample.annot file contains 3 columns with column names. col 1 is sample with sample ids; col 2 is population with values of "AFR/EUR/ASN/EAS/SAS"; col 3 is gender with values of "male/female".

Value

a SeqSQC object with the filepath to the gds file which stores the genotype, the summary of samples and variants, and the QCresults including the sample annotation information.

Author(s)

Qian Liu qliu7@buffalo.edu

Examples

1
2
3
4
5
6
infile <- system.file("extdata", "example_sub.vcf", package="SeqSQC")
sample.annot <- system.file("extdata", "sampleAnnotation.txt", package="SeqSQC")
cr <- system.file("extdata", "CCDS.Hs37.3.reduced_chr1.bed", package="SeqSQC")
outfile <- file.path(tempdir(), "testWrapUp")
seqfile <- LoadVfile(vfile = infile, output = outfile, capture.region = cr,
sample.annot = sample.annot)

SeqSQC documentation built on Nov. 8, 2020, 5:03 p.m.