Description Usage Arguments Retrieving miRNA information Retrieving host genes and transcripts Note Author(s) See Also Examples
Use and retrieve miRNA host gene definitions stored in a corresponding
database. Such database packages can be created using the
makeMirhostgenesPackage
function (see the
corresponding help page for more information).
For some basic usage of the database and package see the
MirhostDb
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ## S4 method for signature 'MirhostDb'
hostgenes(x, columns=listColumns(x, "host_gene"),
filter, order.by="gene_id",
order.type="asc", return.type="DataFrame")
## S4 method for signature 'MirhostDb'
hostgenesBy(x, by="pre_mirna_algn",
columns=listColumns(x, "host_gene"), filter,
return.type="DataFrame", drop.empty=TRUE,
use.names=FALSE)
## S4 method for signature 'MirhostDb'
hosttx(x, columns=listColumns(x, "host_tx"), filter,
order.by="tx_id", order.type="asc",
return.type="DataFrame")
## S4 method for signature 'MirhostDb'
hosttxBy(x, by="pre_mirna_algn",
columns=listColumns(x, "host_tx"), filter,
return.type="DataFrame", drop.empty=TRUE,
use.names=FALSE)
## S4 method for signature 'MirhostDb'
matmirnas(x, columns=listColumns(x, "mat_mirna"),
filter, order.by="mat_mirna_id",
order.type="asc", return.type="DataFrame")
## S4 method for signature 'MirhostDb'
matmirnasBy(x, by="pre_mirna_algn",
columns=listColumns(x, "mat_mirna"), filter,
return.type="DataFrame", use.names=FALSE)
## S4 method for signature 'MirhostDb'
matmirnasInMultiplePremirnas(x,columns=c(listColumns(x, "mat_mirna"),
"pre_mirna_id", "pre_mirna_name"),
filter=list(),
return.type="DataFrame")
## S4 method for signature 'MirhostDb'
premirnas(x, columns=listColumns(x, "pre_mirna"), filter,
order.by="pre_mirna_id", order.type="asc",
return.type="DataFrame")
## S4 method for signature 'MirhostDb'
premirnasBy(x, by="mat_mirna",
columns=listColumns(x, "pre_mirna"),
filter, return.type="DataFrame",
use.names=FALSE)
## S4 method for signature 'MirhostDb'
premirnasWithMultipleAlignments(x,
columns=listColumns(x, "pre_mirna"),
filter=list(),
return.type="DataFrame")
## S4 method for signature 'MirhostDb'
probesets(x, columns=listColumns(x, "array_feature"), filter,
order.by="probeset_id", order.type="asc",
return.type="DataFrame")
## S4 method for signature 'MirhostDb'
probesetsBy(x, by="pre_mirna_algn",
columns=listColumns(x, "array_feature"),
filter, return.type="DataFrame",
drop.empty=TRUE,
use.names=FALSE)
|
(in alphabetical order)
by |
For |
columns |
Character vector of columns (attributes) to return from the database. For a
complete list of available columns use the methods
|
drop.empty |
For |
filter |
A single filter instance or |
order.by |
The column by which the result should be ordered. Can also be a
string with multiple columns, separated by a |
order.type |
Either |
return.type |
Allows to specify the class of the result object. Allowed values are
Note that methods
|
use.names |
Uses, if available, the names instead of the IDs to group elements (e.g. the pre-miRNA name instead of the pre-miRNA ID). Note, that the gene name (symbol) might be empty for some genes, thus, all entries for genes without a name would be grouped together. |
x |
The |
These methods allow to access all miRNA related information from the database (i.e. get mature miRNAs and pre-miRNAs).
Returns all mature miRNAs from the database along with optional
additional columns from other database tables (which can be
empty for some mature miRNAs). Note that column "sequence"
returns the actual RNA sequence of the mature miRNA, not the
genomic DNA defined by the columns "mat_mirna_seq_start"
and "mat_mirna_seq_end"
.
Also, be aware that mature miRNAs encoded in several pre-miRNAs or in
pre-miRNAs with multiple genomic alignments are listed in
multiple rows of the results table (as their start and end
coordinates differ).
To get only a unique list of miRNAs columns
should be set to
c("mat_mirna_id", "mat_mirna_name")
.
The method returns a DataFrame
, data.frame
or
GRanges
depending on the value of the return.type
argument ("DataFrame"
, "data.frame"
or
"GRanges"
, respectively). Entries in the returned object
are ordered according to the parameter order.by
, NOT by any
ordering of values in eventually submitted filter objects.
Returns a CompressedSplitDataFrameList
of DataFrame
s
or a list
of data.frame
s with the names of the list
being the ids by which the mature miRNAs are grouped (e.g. pre-miRNA
ids) and the elements of the list being the host gene
entries. Similar to matmirnas
, column "sequence"
in
the result object contains the RNA sequence of the mature miRNA.
The method returns a SplitDataFrameList
(list of
DataFrame
s), a list
of data.frame
s or a
GRangesList
, depending on the value of the parameter
return.type
("DataFrame"
, "data.frame"
or
"GRanges"
, respectively). The results are ordered by the
value of the by
parameter.
Returns mature miRNAs which are encoded in more than one
pre-miRNA. The return object is the same than for
matrmirnas
.
Returns pre-miRNAs defined by the miRBase along with optional
additional columns from other database tables (which can be
NA
for some pre-miRNAs). Note that column "sequence"
returns the actual RNA sequence of the pre-miRNA, not the
genomic DNA defined by the columns "pre_mirna_seq_start"
and "pre_mirna_seq_end"
.
Also, some pre-miRNAs might have multiple genomic alignments
and might thus be listed multiple times in the returned object.
The method returns a DataFrame
, data.frame
or
GRanges
depending on the value of the return.type
argument ("DataFrame"
, "data.frame"
or
"GRanges"
, respectively). Entries in the returned object
are ordered according to the parameter order.by
, NOT by any
ordering of values in eventually submitted filter objects.
Returns a CompressedSplitDataFrameList
of DataFrame
s
or a list
of data.frame
s with the names of the list
being the ids by which the pre-miRNAs are grouped (e.g. mature miRNA
ids) and the elements of the list being the host gene entries.
The method returns a SplitDataFrameList
(list of
DataFrame
s), a list
of data.frame
s or a
GRangesList
, depending on the value of the parameter
return.type
("DataFrame"
, "data.frame"
or
"GRanges"
, respectively). The results are ordered by the
value of the by
parameter.
Returns pre-miRNAs wich are encoded in several genomic loci. The
return object is the same than for premirnas
.
These methods allow to retrieve host genes and transcripts as well as microarray features (probe sets) targeting these.
Returns all predicted host genes from the database along with
optional additional columns from other database tables.
Host genes with gene_biotype
equal to "miRNA"
should
be taken with care, as they represent the actual
pre-miRNAs. Ensembl defines genes for some of the pre-miRNAs
defined in the miRBase. The column/attribute database
specifies in which database the gene is defined ("core"
,
"otherfeatures"
and "vega"
indicating the Ensembl
core database with all known genes, the Ensembl otherfeatures
database and the manually curated Ensembl vega database).
The method returns a DataFrame
or data.frame
depending on the value of the return.type
argument
("DataFrame"
or "data.frame"
). Entries in the returned
object are ordered according to the parameter order.by
, NOT
by any ordering of values in eventually submitted filter objects.
Returns a CompressedSplitDataFrameList
of DataFrame
s
or a list
of data.frame
s with the names of the list
being the ids by which the host genes are grouped (e.g. pre-miRNA
ids) and the elements of the list being the host gene entries.
Note that by default empty elements are dropped (see parameter
drop.empty
).
The method returns a SplitDataFrameList
(list of
DataFrame
s) or a list
of data.frame
s
depending on the value of the parameter return.type
("DataFrame"
or "data.frame"
). The results are
ordered by the value of the by
parameter.
Returns all predicted host transcripts from the database along
with optional additional columns from other database tables.
Note that for host transcripts being the host for several
pre-miRNAs multiple rows are present in the result table (one for
each pre-miRNA). To get a unique list of host transcripts, the
columns
parameter should be restricted to c("tx_id",
"tx_biotype", "gene_id")
.
The columns in_intron
and in_exon
specify in which
intron or exon of the transcript the pre-miRNA is encoded (0 for
not in intron or exon), exon_id
indicates the exon id for
exonic pre-miRNAs and the column is_outside
indicates
whether the pre-miRNA is only partially inside the transcript.
See the package's vignette for a detailed description.
The method returns a DataFrame
or data.frame
depending on the value of the return.type
argument
("DataFrame"
or "data.frame"
). Entries in the returned
object are ordered according to the parameter order.by
, NOT
by any ordering of values in eventually submitted filter objects.
Returns a CompressedSplitDataFrameList
of DataFrame
s
or a list
of data.frame
s with the names of the list
being the ids by which the host transcripts are grouped (e.g. pre-miRNA
ids) and the elements of the list being the host gene entries.
Note that by default empty elements are dropped (see parameter
drop.empty
).
The method returns a SplitDataFrameList
(list of
DataFrame
s) or a list
of data.frame
s
depending on the value of the parameter return.type
("DataFrame"
or "data.frame"
). The results are
ordered by the value of the by
parameter.
Returns microarray probe sets which where found to target the host
transcripts. Note that in the database probe sets for different
microarrays can be stored, thus it might be advisable to use a
ArrayFilter
to restrict to probe sets for one
specific microarray (use listArrays
to get an
overview of all microarrays for which probe sets are available).
The method returns a DataFrame
or data.frame
depending on the value of the return.type
argument
("DataFrame"
or "data.frame"
). Entries in the returned
object are ordered according to the parameter order.by
, NOT
by any ordering of values in eventually submitted filter objects.
Returns microarray probe sets grouped by the column specified with
the argument by
.
The method returns a SplitDataFrameList
(list of
DataFrame
s) or a list
of data.frame
s
depending on the value of the parameter return.type
("DataFrame"
or "data.frame"
). The results are
ordered by the value of the by
parameter.
The default grouping of transcripts or genes for hosttxBy
and
hostgenesBy
is by the pre_mirna_algn
(i.e. the alignment
ID of the pre-miRNA), since pre-miRNAs might have multiple genomic
alignments and the thus returned, grouped, transcripts or genes
might be encoded on different chromosomes.
For the matmirnas
,premirnas
, hostgenes
and
hosttx
methods the internal SQL call uses a left join starting
from the respective table (e.g. "mature_mirna"
for
matmirnas
), thus returning all entries from that table, but
eventually NA
s for columns from other tables if no value from
that table is linked to any of the entries in the first table.
As a result, a call to premirnas
with columns set to
"pre_mirna_name"
and "tx_id"
will return the IDs of all
pre-miRNAs and the ID of their respective putative host transcripts,
or NA
if none was defined. A call to hosttx
with the
same columns will however return less results from the database, as
IDs of pre-miRNAs without a specified host transcripts are not
returned (see example below).
In functions matmirnasBy
, premirnasBy
,
hostgenesBy
and hosttxBy
, the internal left join starts
from the database table in which the attribute (column) specified with
the by
argument is defined. As a consequence, entries for which
the column specified by by
is empty are NOT returned.
To get all entries from the database, the methods matmirnas
,
premirnas
, hostgenes
and hosttx
can be used
instead, adding additional column names to the columns
argument.
Johannes Rainer
MirhostDb
, listColumns
, listTables
makeMirhostgenesPackage
, PositionFilter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | library(MirhostDb.Hsapiens.v75.v20)
## define a "shortcut" to the database
Mhdb <- MirhostDb.Hsapiens.v75.v20
##***************************************
##
## mature miRNAs
##
##***************************************
## Simply get all mature miRNAs; the result is however not a unique list of miRNAs,
## since miRNAs from pre-miRNAs with multiple genomic alignments are listed in
## mulitple rows.
MatMir <- matmirnas(Mhdb)
MatMir
length(unique(MatMir$mat_mirna_id))
## Get mat_mirna and pre_mirna entries for mature miRNA MIMAT0000062.
MatMir <- matmirnas(Mhdb,
columns=unique(c(listColumns(Mhdb, "mat_mirna"),
listColumns(Mhdb, "pre_mirna"))),
filter=list(MatMirnaIdFilter("MIMAT0000062")))
MatMir
## The same mature miRNA is encoded in 3 different pre-miRNAs.
## Get all mature miRNAs along with their pre-miRNAs in which they are encoded
## and their sequence.
MatMir <- matmirnas(Mhdb, columns=c("mat_mirna_id", "mat_mirna_name",
"pre_mirna_name", "seq_name", "sequence"))
MatMir
length(unique(MatMir$mat_mirna_id))
length(unique(MatMir$pre_mirna_name))
## Get all mature miRNAs along with the potential host gene in which they are encoded.
MatMir <- matmirnas(Mhdb, columns=c("mat_mirna_id", "mat_mirna_name",
"seq_name", "gene_id", "gene_name", "gene_biotype"))
MatMir
## The mature miRNAs present in host genes.
MatMir.inhg <- MatMir[ !is.na(MatMir$gene_id), ]
MatMir.nohg <- MatMir[ is.na(MatMir$gene_id), ]
MatMir.inhg
## However, a considerable number of "host genes" are actually the pre-miRNAs, which some of them
## are stored in the Ensembl database as "gene" with the biotype "miRNA".
table(MatMir.inhg$gene_biotype)
## Now, get all mature miRNAs for which the gene_biotype!=miRNA.
MatMir <- matmirnas(Mhdb, columns=c("mat_mirna_id", "mat_mirna_name",
"seq_name", "gene_id", "gene_name", "gene_biotype"),
filter=list(GeneBiotypeFilter("miRNA", condition="!=")))
MatMir
sum(is.na(MatMir$gene_biotype))
table(MatMir$gene_biotype)
## Get all mature miRNAs as GRanges.
matmirnas(Mhdb, return.type="GRanges")
## Get all mature miRNAs that are encoded in more than one pre-miRNA.
matmirnasInMultiplePremirnas(Mhdb)
##***************************
## matmirnasBy
## Get all mature miRNAs grouped by pre-miRNA.
matmirnasBy(Mhdb, by="pre_mirna")
## Get all mature miRNAs groped by mirfam as GRanges.
matmirnasBy(Mhdb, by="mirfam", return.type="GRanges")
## Get mature miRNAs for pre-miRNA miR-16-1 and miR-16-2.
matmirnasBy(Mhdb,
filter=list(PreMirnaFilter(c("hsa-mir-16-2", "hsa-mir-16-1"))))
##***************************************
##
## pre-miRNAs
##
##***************************************
## Get all pre-miRNAs.
PreMir <- premirnas(Mhdb)
PreMir
length(unique(PreMir$pre_mirna_name))
## Get all pre-miRNAs as GRanges.
premirnas(Mhdb, return.type="GRanges")
## Get all pre-miRNAs along with their miRNA family and their sequence.
## Since we don't ask for the pre_mirna_seq_start and end we get a
## unique table of pre-miRNAs.
PreMir <- premirnas(Mhdb, columns=c("pre_mirna_name", "mirfam_name",
"sequence"))
PreMir
## We have some pre-miRNAs without family
sum(is.na(PreMir$mirfam_name))
## but none without sequence.
sum(is.na(PreMir$sequence))
## Get all pre-miRNAs with multiple genomic alignments.
premirnasWithMultipleAlignments(Mhdb)
##***************************
## premirnasBy
## Get the pre-miRNAs by the mature_mirna.
PB <- premirnasBy(Mhdb, by="mat_mirna")
## Add also additional stuff and fetch all pre-miRNAs for host gene SMC4:
premirnasBy(Mhdb, columns=c("pre_mirna_name", "sequence", "mirfam_name",
"mat_mirna_name"), filter=list(GenenameFilter("SMC4")))
## Get all pre-miRNAs by host_gene SMC4.
premirnasBy(Mhdb, by="host_gene", filter=list(GenenameFilter("SMC4")))
## Get all pre-miRNAs by host_gene SMC4 as GRanges.
premirnasBy(Mhdb, by="host_gene", filter=list(GenenameFilter("SMC4")),
return.type="GRanges")
##***************************************
##
## host transcripts
##
##***************************************
## Get all host transcripts from the database.
HT <- hosttx(Mhdb)
HT
nrow(HT)
## The same host_tx might be the host for multiple miRNAs, thus we do have non-unique tx_ids.
length(unique(HT$tx_id))
## Get a unique table of host transcripts.
HT <- hosttx(Mhdb, columns=c("tx_id", "tx_biotype", "gene_id"))
HT
nrow(HT)
length(unique(HT$tx_id))
## Get the host transcripts along with the corresponding gene.
HT <- hosttx(Mhdb, columns=c("tx_id", "in_intron", "in_exon", "gene_id",
"gene_name", "entrezid", "database"))
HT
## In what databases are these transcripts defined?
table(HT$database)
nrow(HT)
## Note that the information from the various databases is redundant
## (e.g. the same gene can be defined in the Ensembl code database as
## well as in the NCBI RefSeq database which genes are provided through
## the Ensembl otherfeatures database.
## To avoid getting redundant entries it is possible to use a
## DatabaseFilter:
HT <- hosttx(Mhdb, columns=c("tx_id", "in_intron", "in_exon", "gene_id",
"gene_name", "entrezid", "database"),
filter=list(DatabaseFilter("core")))
HT
nrow(HT)
## Include now also the pre_mirna ids.
HT <- hosttx(Mhdb, columns=c("tx_id", "in_intron", "in_exon", "gene_id",
"gene_name", "entrezid", "database",
"pre_mirna_id", "pre_mirna_name"))
HT
nrow(HT)
## We have now more rows, since different pre-miRNAs might be
## associated with the same host_tx.
length(unique(HT$tx_id))
##***************************
## hosttxBy
## Get the host transcripts by the pre-miRNA
## this will drop automatically empty entries, i.e. pre-miRNAs for which
## no host transcript was defined.
HT <- hosttxBy(Mhdb, by="pre_mirna", columns=c("tx_id", "tx_biotype",
"in_intron", "in_exon",
"pre_mirna_name"))
HT
## To get all of them we scan set drop.empty=FALSE.
HT <- hosttxBy(Mhdb, by="pre_mirna",
columns=c("tx_id", "tx_biotype", "in_intron", "in_exon",
"pre_mirna_name"), drop.empty=FALSE)
HT
## There are however also some without any entries:
empties <- unlist(lapply(HT, function(z){ return(all(is.na(z$tx_id))) }))
sum(empties)
HT[ empties ]
## Host transcripts by gene.
HT <- hosttxBy(Mhdb, by="host_gene")
HT
##***************************************
##
## host genes
##
##***************************************
## With the host genes it is just the same as above.
HG <- hostgenes(Mhdb)
HG
length(unique(HG$gene_id))
nrow(HG)
##***************************
## hostgenesBy
## Get the host genes by the pre-miRNA.
HG <- hostgenesBy(Mhdb, by="pre_mirna")
HG
## Get host genes by mirfam.
HG <- hostgenesBy(Mhdb, by="mirfam",
columns=c("gene_id", "gene_name", "mirfam_name"))
HG
##***************************************
##
## probe sets
##
##***************************************
## First get a list of microarrays for which probe sets are available.
listArrays(Mhdb)
AF <- ArrayFilter("HG-U133_Plus_2")
## Get all probe sets from the database along with the gene name and
## the pre-miRNA name.
PS <- probesets(Mhdb, columns=c(listColumns(Mhdb, "array_feature" ),
"gene_name", "pre_mirna_name"), filter=list(AF))
PS
## Get all probe sets grouped by pre-miRNA name.
PS <- probesetsBy(Mhdb, by="pre_mirna", use.names=TRUE, filter=list(AF))
PS
##***************************************
##
## The effect of the left join
##
##***************************************
## Get all pre-miRNAs and the ID of the host transcript.
fromPre <- premirnas(Mhdb, columns=c("pre_mirna_name", "tx_id"))
## Get the same columns, but starting from table "host_tx"
fromTx <- hosttx(Mhdb, columns=c("pre_mirna_name", "tx_id"))
## We have less rows for the latter query.
nrow(fromPre)
nrow(fromTx)
## The reason being, that pre-miRNAs without host transcript are not returned
## by the second query, while they are for the first.
sum(is.na(fromPre$tx_id))
sum(is.na(fromTx$tx_id))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.