filterpos: Filter variants against known SNP dataset

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/filterpos.R

Description

The function is used to filter variants against known SNP dataset in VCF, bed, gff or user-specified position files. For example, variants in VCF format can be filtered against dbSNP, 1000 genome project dataset, customized VCF data and so on.

Usage

1
2
filterpos(vcf, position=NULL, file="", type="vcf", tbi=FALSE, chr=TRUE,
tabix="tabix", ...)

Arguments

vcf

A VCF object for filtering.

position

A data.frame or matrix with chromosome names in the first column, start positions in the second column and end positions in the third column (1-based). This can be used to filter against customized VCF data.

file

The file containing the known SNPs.

type

The date format of input file. It can be 'vcf', 'bed' or 'gff'.

tbi

Logical value. If TRUE, the input file should be indexed by tabix for efficient information retrieval.

chr

Logical value. If TRUE, the chromosome names of the input file should have the prefix of 'chr', e.g. 'chr1'. If FALSE, the chromosome names don't have the 'chr' prefix.

tabix

The path of tabix function. if NULL, scanTabix function from Rsamtools will be used instead.

...

More arguments for read.table when reading the input file.

Details

Variants can be filtered against dbSNP and 1000 genome project dataset to eliminate common variants.

For example, the dbSNP 132 can be download from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp132.txt.gz). The 2nd-5th columns of the dataset can be extracted easily using 'cut' or 'awk' into a bed format file. The bed file can be indexed by 'tabix' for efficient information retrieval. The filterpos function can be used to eliminate the variants observed in the indexed dataset file, with arguments type="bed" and tbi=TRUE.

Value

The input vcf data will be filtered against known SNP database or user-specified position files. A list including filtered vcf data and dropped vcf data will be returned.

Author(s)

Qiang Hu

See Also

pos2seq

Examples

1
2
3
4
5
6
# ivcffile1 <- system.file("extdata", "1151HZ0001.flt.vcf", package="VPA")
# vcfdata1 <- read.vcf(vcffile1)
# vcffile2 <- system.file("extdata", "1151HZ0006.flt.vcf", package="VPA")
# vcfdata2 <- read.vcf(vcffile2)
# vcf <- filterpos(vcfdata1, position=cbind(vcfdata2$CHROM, vcfdata2$POS,
#  vcfdata2$POS), chr=FALSE)

VPA documentation built on May 2, 2019, 4:45 p.m.