Filter-classes: Filters supported by ensembldb

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

ensembldb supports most of the filters from the AnnotationFilter package to retrieve specific content from EnsDb databases. These filters can be passed to the methods such as genes with the filter parameter or can be added as a global filter to an EnsDb object (see addFilter for more details). Use the supportedFilters to list all filters supported for EnsDb object.

supportedFilters returns a data.frame with the names of all filters and the corresponding field supported by the EnsDb object.

seqnames: accessor for the sequence names of the GRanges object within a GRangesFilter

seqnames: accessor for the seqlevels of the GRanges object within a GRangesFilter

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
OnlyCodingTxFilter()

ProtDomIdFilter(value, condition = "==")

UniprotDbFilter(value, condition = "==")

UniprotMappingTypeFilter(value, condition = "==")

TxSupportLevelFilter(value, condition = "==")

## S4 method for signature 'EnsDb'
supportedFilters(object, ...)

## S4 method for signature 'GRangesFilter'
seqnames(x)

## S4 method for signature 'GRangesFilter'
seqlevels(x)

Arguments

value

The value(s) for the filter. For GRangesFilter it has to be a GRanges object.

condition

character(1) specifying the condition of the filter. For character-based filters (such as GeneIdFilter) "==", "!=", "startsWith" and "endsWith" are supported. Allowed values for integer-based filters (such as GeneStartFilter) are "==", "!=", "<". "<=", ">" and ">=".

object

For supportedFilters: an EnsDb object.

...

For supportedFilters: currently not used.

x

For seqnames, seqlevels: a GRangesFilter object.

Details

ensembldb supports the following filters from the AnnotationFilter package:

GeneIdFilter

filter based on the Ensembl gene ID.

GenenameFilter

filter based on the name of the gene as provided by Ensembl. In most cases this will correspond to the official gene symbol.

SymbolFilter

filter based on the gene names. EnsDb objects don't have a dedicated symbol column, the filtering is hence based on the gene names.

GeneBiotype

filter based on the biotype of genes (e.g. "protein_coding").

GeneStartFilter

filter based on the genomic start coordinate of genes.

GeneEndFilter

filter based on the genomic end coordinate of genes.

EntrezidFilter

filter based on the genes' NCBI Entrezgene ID.

TxIdFilter

filter based on the Ensembld transcript ID.

TxNameFilter

filter based on the Ensembld transcript ID; no transcript names are provided in EnsDb databases.

TxBiotypeFilter

filter based on the transcripts' biotype.

TxStartFilter

filter based on the genomic start coordinate of the transcripts.

TxEndFilter

filter based on the genonic end coordinates of the transcripts.

ExonIdFilter

filter based on Ensembl exon IDs.

ExonRankFilter

filter based on the index/rank of the exon within the transcrips.

ExonStartFilter

filter based on the genomic start coordinates of the exons.

ExonEndFilter

filter based on the genomic end coordinates of the exons.

GRangesFilter

Allows to fetch features within or overlapping specified genomic region(s)/ range(s). This filter takes a GRanges object as input and, if type = "any" (the default) will restrict results to features (genes, transcripts or exons) that are partially overlapping the region. Alternatively, by specifying condition = "within" it will return features located within the range. In addition, the GRangesFilter supports condition = "start", condition = "end" and condition = "equal" filtering for features with the same start or end coordinate or that are equal to the GRanges.

Note that the type of feature on which the filter is applied depends on the method that is called, i.e. genes will filter on the genomic coordinates of genes, transcripts on those of transcripts and exons on exon coordinates.

Calls to the methods exonsBy, cdsBy and transcriptsBy use the start and end coordinates of the feature type specified with argument by (i.e. "gene", "transcript" or "exon") for the filtering.

If the specified GRanges object defines multiple regions, all features within (or overlapping) any of these regions are returned.

Chromosome names/seqnames can be provided in UCSC format (e.g. "chrX") or Ensembl format (e.g. "X"); see seqlevelsStyle for more information.

SeqNameFilter

filter based on chromosome names.

SeqStrandFilter

filter based on the chromosome strand. The strand can be specified with value = "+", value = "-", value = -1 or value = 1.

ProteinIdFilter

filter based on Ensembl protein IDs. This filter is only supported if the EnsDb provides protein annotations; use the hasProteinData method to evaluate.

UniprotFilter

filter based on Uniprot IDs. This filter is only supported if the EnsDb provides protein annotations; use the hasProteinData method to evaluate.

In addition, the following filters are defined by ensembldb:

TxSupportLevel

allows to filter results using the provided transcript support level. Support levels for transcripts are defined by Ensembl based on the available evidences for a transcript with 1 being the highest evidence grade and 5 the lowest level. This filter is only supported on EnsDb databases with a db schema version higher 2.1.

UniprotDbFilter

allows to filter results based on the specified Uniprot database name(s).

UniprotMappingTypeFilter

allows to filter results based on the mapping method/type that was used to assign Uniprot IDs to Ensembl protein IDs.

ProtDomIdFilter

allows to retrieve entries from the database matching the provided filter criteria based on their protein domain ID (protein_domain_id).

OnlyCodingTxFilter

allows to retrieve entries only for protein coding transcripts, i.e. transcripts with a CDS. This filter does not take any input arguments.

Value

For ProtDomIdFilter: A ProtDomIdFilter object.

For UniprotDbFilter: A UniprotDbFilter object.

For UniprotMappingTypeFilter: A UniprotMappingTypeFilter object.

For TxSupportLevel: A TxSupportLevel object.

For supportedFilters: a data.frame with the names and the corresponding field of the supported filter classes.

Note

For users of ensembldb version < 2.0: in the GRangesFilter from the AnnotationFilter package the condition parameter was renamed to type (to be consistent with the IRanges package) . In addition, the condition = "overlapping" is no longer recognized. To retrieve all features overlapping the range type = "any" has to be used.

Protein annotation based filters can only be used if the EnsDb database contains protein annotations, i.e. if hasProteinData is TRUE. Also, only protein coding transcripts will have protein annotations available, thus, non-coding transcripts/genes will not be returned by the queries using protein annotation filters.

Author(s)

Johannes Rainer

See Also

supportedFilters to list all filters supported for EnsDb objects. listUniprotDbs and listUniprotMappingTypes to list all Uniprot database names respectively mapping method types from the database.

GeneIdFilter for more details on the filter objects.

genes, transcripts, exons, listGenebiotypes, listTxbiotypes.

addFilter for globally adding filters to an EnsDb object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
## Create a filter that could be used to retrieve all informations for
## the respective gene.
gif <- GeneIdFilter("ENSG00000012817")
gif

## Create a filter for a chromosomal end position of a gene
sef <- GeneEndFilter(10000, condition = ">")
sef

## For additional examples see the help page of "genes".


## Example for GRangesFilter:
## retrieve all genes overlapping the specified region
grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
                             strand = "+"), type = "any")
library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75
genes(edb, filter = grf)

## Get also all transcripts overlapping that region.
transcripts(edb, filter = grf)

## Retrieve all transcripts for the above gene
gn <- genes(edb, filter = grf)
txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
## Next we simply plot their start and end coordinates.
plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)),
yaxt="n", ylab="")
## Highlight the GRangesFilter region
rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs),
col="red", border="red")
for(i in 1:length(txs)){
    current <- txs[i]
    rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
    text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
}
## Thus, we can see that only 4 transcripts of that gene are indeed
## overlapping the region.


## No exon is overlapping that region, thus we're not getting anything
exons(edb, filter = grf)


## Example for ExonRankFilter
## Extract all exons 1 and (if present) 2 for all genes encoded on the
## Y chromosome
exons(edb, columns = c("tx_id", "exon_idx"),
      filter=list(SeqNameFilter("Y"),
                  ExonRankFilter(3, condition = "<")))


## Get all transcripts for the gene SKA2
transcripts(edb, filter = GenenameFilter("SKA2"))

## Which is the same as using a SymbolFilter
transcripts(edb, filter = SymbolFilter("SKA2"))


## Create a ProteinIdFilter:
pf <- ProteinIdFilter("ENSP00000362111")
pf
## Using this filter would retrieve all database entries that are associated
## with a protein with the ID "ENSP00000362111"
if (hasProteinData(edb)) {
    res <- genes(edb, filter = pf)
    res
}

## UniprotFilter:
uf <- UniprotFilter("O60762")
## Get the transcripts encoding that protein:
if (hasProteinData(edb)) {
    transcripts(edb, filter = uf)
    ## The mapping Ensembl protein ID to Uniprot ID can however be 1:n:
    transcripts(edb, filter = TxIdFilter("ENST00000371588"),
        columns = c("protein_id", "uniprot_id"))
}

## ProtDomIdFilter:
pdf <- ProtDomIdFilter("PF00335")
## Also here we could get all transcripts related to that protein domain
if (hasProteinData(edb)) {
    transcripts(edb, filter = pdf, columns = "protein_id")
}

YTLogos/ensembldb documentation built on May 3, 2019, 9:03 p.m.