VariantFilteringResults-class | R Documentation |
The VariantFilteringResults
class is used to store the kind of object obtained as a result of an analysis using the functions unrelatedIndividuals()
, autosomalRecessiveHomozygous()
, autosomalRecessiveHeterozygous()
, autosomalDominant()
, deNovo()
and xLinked()
. Its purpose is to ease the task of filtering and prioritizing the variants annotated by those functions.
Variants are stored within a VariantFilteringResults
object using a VRanges
object, which also holds the variant annotations in its metadata columns. VariantFiltering adds the following core set of annotations.
Region where the variant is located (coding, intronic, splice site, promoter, ...) as given by the function locateVariants()
from the VariantAnnotation
package.
Start position of the variant within the region defined by the LOCATION
annotation.
Gene identifier derived with the transcript-centric annotation package given in the txdb
argument
of the VariantFilteringParam()
function, typically an Entrez Gene identifier.
Gene name given by HGNC derived with the gene-centric annotation package given in the orgdb
argument
of the VariantFilteringParam()
function.
Type of variant, either a single nucleotide variant (SNV), an insertion,
a deletion, a multinucleotide variant (MNV) or a deletion followed by an
insertion (Delins). These types are determined using functions
isSNV()
,
isInsertion()
,
isDeletion()
,
isSubstitution()
and
isDelins()
from the
VariantAnnotation
package.
dbSNP identifier derived by position from the annotation packages given
in the snpdb
argument of the VariantFilteringParam()
function.
Location of the variant along the processed transcript, when the variant belongs to an exonic region.
Consequence of the variant when located in the coding region (synonymous,
nonsynonymous, missense, nonsense o frameshift) as given by the function
predictCoding()
from the VariantAnnotation
package.
Transcript name extracted from the TxDb
annotation package given
by the txdb
argument of the VariantFilteringParam()
function.
HGVS description of the variant at genomic level.
HGVS description of the variant at coding level.
HGVS description of the variant at protein level.
OMIM identifier of the gene associated to the variant derived with the gene-centric
annotation package given in the orgdb
argument
of the VariantFilteringParam()
function.
In the case of coding variants, whether the amino acid change is conservative or
radical according to the matrix of amino acid biochemical properties given in the
argument radicalAAchangeFilename
of the VariantFilteringParam()
function.
Score for the cryptic 5'ss for the REF allele respect to the ALT allele.
Maximum score for a potential cryptic 5'ss created by the ALT allele.
Position of the allele respect to the position of the dinucleotide GT
,
considering those as positions 1 and 2.
Score for the cryptic 3'ss for the REF allele respect to the ALT allele.
Maximum score for a potential cryptic 3'ss created by the ALT allele.
Position of the allele respect to the position of the dinucleotide AG
,
considering those as positions 1 and 2.
A VariantFilteringResults
has the following set of accessor methods.
length(x)
: total number of variants stored internatlly within the
VRanges
object. Note that this number will be typically larger than the number
of variantes in the input VCF object because each of them is copied for each combination
of alternate allele, annotated region and sample.
param(x)
: returns the VariantFilteringParam
input parameter
object employed in the call that produced the VariantFilteringResults
object x
.
inheritanceModel(x)
: returns the model of inheritance employed in the
call that produced the VariantFilteringResults
object x
.
samples(object)
: active samples from which the current filtered variants were derived. If the
x
was obtained with unrelatedIndividuals()
, then the replace method
samples(object)<-
can be used to restrict the subset of active samples. In every other case
(autosomalDominant()
, etc. ) active samples cannot be changed.
resetSamples(object)
:set back as active samples the initial set of samples specified in the input parameter object.
sog(x)
: Sequence Ontology (SO) graph (actually, an acyclic digraph)
returned as a graphNEL
object, whose vertices are SO terms,
edges represent ontology relationships and vertex attributes vcfIdx
and
varIdx
contain what variants are annotated to each SO term. These annotations
can be directly retrieved from the SO graph with the nodeData()
function from the graph
package. The summary()
function described
in this manual page allows one to tally the number of variants in each SO term throughout
the entire SO hierarchy.
bamFiles(x)
: access and update the BamViews
object containing
references to BAM files from which the input VCF files were derived. Initially this is empty.
allVariants(x, groupBy="sample")
: returns a VRangesList
object with all variants grouped by default by sample. Using the argument groupBy
we can specify any metadata column to be used to group variants. If the value given to
groupBy
does not correspond to any such columns, a
VRanges
object with all variants together is returned.
filteredVariants(x, groupBy="sample")
: it works like allVariants(x)
but instead of returning all variants, it returns only those who pass the active
filters; see filters()
and cutoffs()
below.
The variants contained in a VariantFilteringResults
object can be filtered using
the FilterRules
mechanism, defined in the S4Vectors
package,
by using the functions filters()
and cutoffs()
described below. There are
additional functions, also described in this section, to facilitate this task on the set
of core annotations provided by VariantFiltering
.
filters(x)
: get the current FilterRules
object that defines
the available set of filter criteria that one can use to filter the variants contained in
x
. This can also be used as a replacement function filters(x)<-
to update
this set of filters. The actual filtering is done when calling the function
filteredVariants()
.
filtersMetadata(x)
:metadata about the available filters.
cutoffs(x)
:get cutoffs from the available filters.
change(x, cutoff)<-
: change cutoffs from the available filters. Here, argument x
is a CutoffsList
object given by the method cutoffs()
, and argument cutoff
is a character string with the name of the cutoff.
softFilterMatrix(x)
: get and update the variant by filter matrix; see
softFilterMatrix()
in the VariantAnnotation
package.
dbSNPpresent(x)
: flag whether to filter variants present or absent from dbSNP (NA
-do not filter-, "Yes"
, "No"
).
variantType(x)
: filter by type of variant ( "SNV"
, "Insertion"
, "Deletion"
, "MNV"
, "Delins"
).
variantLocation(x)
: filter by variant location ("coding"
, "intron"
, "threeUTR"
, "fiveUTR"
, "intergenic"
, "spliceSite"
, "promoter"
).
variantConsequence(x)
: filter by variant consequence ("snynonymous"
, "nonsynonymous"
, "frameshift"
, "nonsense"
, "not translated"
).
aaChangeType(x)
: filter by type of change of amino acid ("Radical"
, "Conservative"
).
OMIMpresent(x)
: flag whether to filter variants whose associated genes are present or absent from OMIM (NA
-do not filter-, "Yes"
, "No"
).
naMAF(x)
:flag whether NA maximum MAF values should be included in the filtered variants.
maxMAF(x)
:maximum MAF value that a variant may meet among the selected populations.
minPhastCons(x)
: minimum phastCons score for nucleotide conservation (NA
-do not filter-, [0-1]).
minPhylostratum(x)
: minimum phylostratum for gene conservation (NA
-do not filter-, [1-20]).
MAFpop(x)
:selection of populations to use when filtering by maximum MAF value.
minScore5ss(x)
: minimum weight matrix score on a cryptic 5'ss. NA
indicates this filter is not applied.
minScore3ss(x)
: minimum weight matrix score on a cryptic 3'ss. NA
indicates this filter is not applied.
minCUFC(x)
:minimum absolute codon-usage log2 fold-change.
The following functions help in summarizing, visualizing and reporting the fiiltered variants.
summary(object, method=c("SO", "SOfull", "bioc"))
: tally the current
filtered set of variants to features. By default, features are Sequence
Ontology (SO) terms to which variants are annotated by VariantFiltering
.
The method
argument allows the user to change this default setting to
tallying throughout the entire SO hierarchy. Both options, SO
and
SOfull
can be used in combination with the cutoff SOterms
; see
the vignette. The option method="bioc"
considers as features the
regions and consequences annotated by functions
locateVariants()
and
predictCoding()
from the VariantAnnotation
package. The result is returned as a data.frame
object.
plot(x, what, sampleName, flankingNt=20, showAlnNtCutoff=200, isPaired=FALSE, ...)
:Plot variants using the Gviz
package. The argument what
can be
either a character vector specifying gene or variant identifiers or a
chromosome name, or a GRanges
object specifying a genomic region. The
argument sampleName
is optional and allows the user to plot the aligned
reads and coverage from a specific sample, located in the plotted region, when
the corresponding BAM file has been linked to the object with bamFiles()
.
The argument flankingNt
is a number of nucleotides to extend the plotting
region derived from the argument what
. The argument showAlnNtCutoff
is the region size cutoff below which it will be attempted to plot the aligned reads.
The argument isPaired
is passed directly to the Gviz
function
AlignmentsTrack()
which streams over the BAM file to plot the reads
and sets whether the BAM file contains single (default) or paired-end reads.
Further arguments in ...
are passed to the Gviz
function
plotTracks()
and can be used to fine-tune the final plot; see
the vignette of Gviz
to find out what these arguments are.
reportVariants(x, type=c("shiny", "csv", "tsv"), file=NULL)
:Builds a report from the VariantFilteringResult
object x
. Using
the type
argument, the report can take the form of a flat file in CSV
or TSV format or a web shiny
app (default) that enables applying
functional annotation filters in an interactive manner.
When the shiny
app is closed this method returns a
VariantFilteringResult
object with the corresponding filters
switched on or off according to how the app has been interactively used.
R. Castelo
## Not run:
library(VariantFiltering)
CEUvcf <- file.path(system.file("extdata", package="VariantFiltering"),
"CEUtrio.vcf.gz")
CEUped <- file.path(system.file("extdata", package="VariantFiltering"),
"CEUtrio.ped")
param <- VariantFilteringParam(vcfFileNames=CEUvcf, pedFileName=CEUped)
reHo <- autosomalRecessiveHomozygous(param)
naMAF(reHo) <- FALSE
maxMAF(reHo) <- 0.05
reHo
head(filteredVariants(reHo))
reportVariants(reHo, type="csv", file="reHo.csv")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.