VariantFilteringParam-class: VariantFiltering parameter class

VariantFilteringParam-classR Documentation

VariantFiltering parameter class

Description

The class VariantFilteringParam is defined to ease configuring the call to the functions that filter input genetic variants according to a desired segregating inheritance model (xLinked(), autosomalRecessiveHomozygous(), etc).

Usage

VariantFilteringParam(vcfFilename, pedFilename=NA_character_,
                      bsgenome="BSgenome.Hsapiens.1000genomes.hs37d5",
                      orgdb="org.Hs.eg.db",
                      txdb="TxDb.Hsapiens.UCSC.hg19.knownGene",
                      snpdb="SNPlocs.Hsapiens.dbSNP144.GRCh37",
                      weightMatricesFilenames=NA,
                      weightMatricesLocations=rep(list(variantLocations()), length(weightMatricesFilenames)),
                      weightMatricesStrictLocations=rep(list(FALSE), length(weightMatricesFilenames)),
                      radicalAAchangeFilename=file.path(system.file("extdata",
                                                                    package="VariantFiltering"),
                                                        "AA_chemical_properties_HanadaGojoboriLi2006.tsv"),
                      codonusageFilename=file.path(system.file("extdata",
                                                               package="VariantFiltering"),
                                                   "humanCodonUsage.txt"),
                      geneticCode=getGeneticCode("SGC0"),
                      allTranscripts=FALSE,
                      regionAnnotations=list(CodingVariants(), IntronVariants(),
                                             FiveSpliceSiteVariants(), ThreeSpliceSiteVariants(),
                                             PromoterVariants(), FiveUTRVariants(), ThreeUTRVariants()),
                      intergenic=FALSE,
                      otherAnnotations=c("MafDb.1Kgenomes.phase1.hs37d5",
                                         "PolyPhen.Hsapiens.dbSNP131",
                                         "SIFT.Hsapiens.dbSNP137",
                                         "phastCons100way.UCSC.hg19",
                                         "humanGenesPhylostrata"),
                      geneKeytype=NA_character_,
                      yieldSize=NA_integer_)
## S4 method for signature 'VariantFilteringParam'
show(object)
## S4 method for signature 'VariantFilteringParam'
x$name
## S4 method for signature 'VariantFilteringParam'
names(x)

Arguments

vcfFilename

Character string of the input VCF file name.

pedFilename

Character string of the pedigree file name in PED format.

bsgenome

Character string of a genome annotation package (BSgenome.Hsapiens.1000genomes.hs37d5 by default).

orgdb

Character string of a gene-centric annotation package (org.Hs.eg.db by default).

txdb

Character string of a transcript-centric annotation package (TxDb.Hsapiens.UCSC.hg19.knownGene by default). The package GenomicFeatures provides infraestructure to build such annotation packages from different sources such as online UCSC tracks, Biomart tables, or GFF files.

snpdb

Character string of a SNP-centric annotation package (SNPlocs.Hsapiens.dbSNP.20120608 by default).

weightMatricesFilenames

Character string of filenames of position weight matrices for binding site recognition. The default NA value indicates that no binding sites will be scored. To use this feature to score, for instance, splice sites in human, assign to this argument the function spliceSiteMatricesHuman(). See the files (hsap.donors.hcmc10_15_1.ibn and hsap.acceptors.hcmc10_15_1.ibn) returned by this function for details on their format.

weightMatricesLocations

Keywords of the annotated locations to variants under which a weight matrix will be used for scoring binding sites. This argument is only used when weightMatricesFilenames!=NA and, in such case, then more than one matrix is provided, this argument should be a list of character vectors with as many elements as matrices given in weightMatricesFilenames. The possible values can be obtained by typing variantLocations().

weightMatricesStrictLocations

Logical vector flagging whether a weight matrix should be scoring binding sites strictly within the boundaries of the given locations. This argument is only used when weightMatricesFilenames!=NA and, in such case, then more than one matrix is provided, this argument should be a list of logical vectors with as many elements as matrices given in weightMatricesFilenames.

radicalAAchangeFilename

Name of a tab-separated text file containing chemical properties of amino acids. These properties are interpreted such that amino acid changes within a property are considered "conservative" and between properties are considered "radical". See the default file (AA_chemical_properties_HanadaGojoboriLi2006.tsv) for details on its format.

codonusageFilename

Name of a text file containing the codon usage.

geneticCode

Name character vector of length 64 describing the genetic code. The default value is getGeneticCode("SGC0"), the standard genetic code. An alternative genetic code, for instance, is getGeneticCode("SGC1"), the vertebrate mitochondrial genetic code. See getGeneticCode in the Biostrings package for further details.

allTranscripts

Logical. This option allows the user to choose between working with all the transcripts affected by the variant (allTranscripts=TRUE) or with only one transcript per variant.

regionAnnotations

List of VariantType-class objects defining what regions to annotate.

intergenic

Logical. When TRUE, the intergenic variants are also annotated.

otherAnnotations

Character vector of names of annotation packages or annotation objects.

geneKeytype

Character vector of the type of key gene identifier provided by the transcript-centric (TxDb) annotation package to interrogate the organism-centric (OrgDb) annotation package. The default value (NA_character_ indicates that it will be assumed to be an Entrez identifier unless the values in the GENEID column returned by the TxDb package start with ENSG and then it will be assumed that they are Ensembl gene identifiers, or with one of NM_, NP_, NR_, XM_, XP_, XR_ or YP_ and then it will be assumed that they are RefSeq gene identifiers.

yieldSize

Number of variants to yield each time the input VCF file is read. This argument is passed to the TabixFile function when opening the input VCF file and it allows to iterate through the variants in chunks of the given size to limit the memory requirements. Its default value (NA_integer_) implies that the whole input VCF file will be read into main memory.

object

A VariantFilteringParam object created through VariantFilteringParam().

x

A VariantFilteringParam object created through VariantFilteringParam().

name

Slot name of a VariantFilteringParam object. Use names() to find out what these slots are.

Details

The class VariantFilteringParam serves as a purpose of simplifying the call to the inheritance model function and its subsequent annotation and filtering steps. It also groups all the parameters that the user can customize (i.e newer versions of the annotation packages, when available).

The method VariantFilteringParam() creates an VariantFilteringParam object used as an input argument to other functions such as autosomalRecessiveHomozygous(), etc.

The method names() allows one to see the names of the slots from a VariantFilteringParam object. Using the $ operator, one can retrieve the values of these slots in an analogous way to a list.

Value

An VariantFilteringParam object is returned by the method VariantFilteringParam.

Author(s)

D.M. Elurbe, P. Puigdevall and R. Castelo

Examples

vfpar <- VariantFilteringParam(system.file("extdata", "CEUtrio.vcf.bgz", package="VariantFiltering"),
                               system.file("extdata", "CEUtrio.ped", package="VariantFiltering"),
                               snpdb=character(0), otherAnnotations=character(0))
vfpar
names(vfpar)
vfpar$vcfFiles

rcastelo/VariantFiltering documentation built on Oct. 23, 2024, 5:23 p.m.