Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/LoadFiltering.R
To load data from study subjects and perform position-level quality filtering. The index.txt file contains group status and VCF file location of each subject. The function take index.txt file as input to load variant and sequence call files automatically.
1 2 3 |
file |
Formatted input file including the annotation information of study subjects. |
datadir |
The work directory of the index file and variants data. If it is NULL, the absolute path of variants files should be provided in the annotation file. |
filtering |
Logical value. Whether to filter VCF data by specified quality criteria. |
alter.PL |
Phred-scaled genotype likelihoods of variant call to define a variant. The PL information can be extracted from PL column (both GATK and Samtools) in the VCF data. |
alter.AD |
The minimum depth of variant allele when alter is TRUE. The information of variant allele depth can be extracted from AD (GATK) or DP4 (Samtools) column in the VCF data. |
alter.ADP |
The minimum percentage of read depth containing variant allele. |
QUAL |
Phred-scaled variant likelihoods of variant call. The QUAL information can be extracted from QUAL column (both GATK and Samtools) in the VCF data. |
DP |
The minimum and maximum of position-level read depth. The DP information can be extracted from DP column (both GATK and Samtools) in the VCF data. |
GQ |
Phred-scaled score for most likely genotype at position of interest. The GQ information can be extracted from GQ column (both GATK and Samtools) in the VCF data. If NULL, the option will be ignored. |
FILTER |
'NULL' or 'PASS'. The VCF format of variant call produced by GATK will label quality status of each position. This information can be extracted from FILTER column (GATK) in the VCF data. If the VCF data is produced by Samtools, FILTER column will contain empty information. If 'NULL' is set, all variants will be parsed. If 'PASS' is set, only variant with 'PASS' label will be parsed. |
tabix |
The file path of executable tabix. |
parallel |
If TRUE, the function will run in parallel model. |
pn |
The CPU numbers to be used if parallel is TRUE. |
type |
MPI type. See detail in |
... |
Arguments to pass to the method |
file
The input file contains the annotation
information of each sample. Each row is for one sample. The four
columns are separated by tab, including sample name (required), group status (required),
variant call file name (required) and sequence call file name
(optional). Sample name column lists the sample name. Group status
column lists the status (e.g., aggressive, benign or normal) of group
each sample belongs to. Variant call file name column lists the path
of VCF formatted variant call file. Sequence call file name column
lists the path of compressed VCF sequence call file. The high-volume
data in tab-delimited VCF formats can be efficiently compressed by
bgzip program and retrieved through tabix program from open-source
Samtools package. If the VCF format file is compressed by bgzip,
tabix should be installed. The path of tabix should be specified in
the function if it is not in the PATH system environment.
Quality criteria
The detail of quality scores in VCF data can
be found at http://www.1000genomes.org/node/101.
parallel
This function will extract calls in sequential
mode. If parallel is true, the function will extract calls in
parallel mode. The package Rmpi
and snowfall
are
required for parallel mode.
The value returned is a varlist, including vcflist
,
VarIndex
and Samples
.
varlist |
A list of vcf objects, one for each sample. If the filtering is true, the variant data are filtered by specified quality criteria. |
VarIndex |
The indexes for all variant positions. TRUE denotes the presence of variant. FALSE denotes the absence of variant. NA denotes low coverage. |
Sample |
Samples annotation from the input index file. |
Qiang Hu
1 2 3 4 5 | #setwd(system.file("extdata", package="VPA"))
#varflt <- LoadFiltering(file="index1.txt", filtering=TRUE, alter.PL=20,
#alter.AD=3)
#pattern <- cbind(A=c(1/4,1), B=c(0,0))
#varRes1 <- Patterning(varflt, pattern, var.PL=c(FALSE, TRUE))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.