Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/genomeDerivedFeatures.R
A function to extract genome-derived features from the input GRanges
object and the annotations specified by TxDb-like objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
x |
A GRanges object for the genomic ranges to be extracted. |
transcriptdb |
A TxDb or EnsDb object for the transcript annotation. |
sequence |
A BSgenome or XStringSet object for the genome sequence. See the available.genomes function for how to install a genome. |
gscores |
A GScores object specifying the genomic scores to be extracted, or a |
clusteringY |
A GRanges or GRangesList specifying the object for clustering annotation, or a |
extraRegions |
A GRanges or GRangesList providing the additional region to extract properties, can be a |
flankSizes |
An |
ambiguityMethod |
By default, except the overlapping features, if If |
nomapValue |
When If |
annotSeqnames |
|
annotBiotype |
|
The function first extract multiple genomic properties/attributes on the 14 region types defined by the TxDb-like object,
and then assign the attributes to x
through the region closest to the instances of x
. The specific behaviors of the assignments can be modified through the parameters of ambiguityMethod
and nomapValue
.
Particularly, the genome region types include x
(self), exons, introns, 5'UTR (exons only), full 5'UTR, CDS (exons only), full CDS,
3'UTR (exons only), full 3'UTR, mature transcripts, full transcripts, genes (exons only), full genes, and promoters.
For each region type, the function can generate 5 types of basic properties accordingly:
overlap with region,
region length,
relative position of x
on the region,
distance to region 5'end, and
distance to region 3'end.
In addition, when the corresponding annotation objects are provided, the function can add 5 more types of properties.
Specifically, GC contents of regions are extracted if sequence
is provided;
Genomic Scores of the regions are extracted if gscores
is provided; and
Clustering effects of annotations on regions, including Count, Densities, and the Distance to the nearest annotation
are calculated if clusteringY
is provided.
More region types and properties can be extracted if the list
of objects are supplied at the corresponding arguments extraRegions
, gscores
, and clusteringY
.
The names of the list
elements will correspond to the labeling of the regions or properties in the column names of the returned table.
The returned properties and region types can be subsetted and augmented by with the arguments properties
, regions
, supRegion
.
Finally, the function generate 3 extra features to describe the unique properties at the gene's level: gene's exon number, gene's transcript isoform number, and meta-tx topology.
A data.frame
object whose number of rows is the length of x
.
The column types in the data.frame
are all numeric.
Zhen Wei
The extractRegionProperty for crafting of various properties on genomic regions.
The sequenceDerivedFeatures for extraction of sequence-derived features under different encoding schema.
The topologyOnTranscripts for calculation of the meta-tx topology on transcripts of genes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ## ---------------------------------------------------------------------
## SIMPLE EXAMPLE
## ---------------------------------------------------------------------
## Load the hg19 TxDb object for human transcript annotation (UCSC hg19):
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Define the Granges to be extracted:
set.seed(01)
X <- GRanges(rep(c("chr1", "chr2"), c(15, 15)),
IRanges(c(sample(11874:12127, 15), sample(38814:41527,15)), width=5),
strand=Rle(c("+", "-"), c(15, 15)))
## Extract the basic set of properties using the genomic regions defined in TxDb:
gfeatures <- genomeDerivedFeatures(X,
transcriptdb=TxDb.Hsapiens.UCSC.hg19.knownGene)
str(gfeatures)
## ---------------------------------------------------------------------
## DEFINE MORE PROPERTIES AND REGIONS
## ---------------------------------------------------------------------
## Not run:
library(BSgenome.Hsapiens.UCSC.hg19)
library(phastCons100way.UCSC.hg19)
## Extract more genomic properties derived from genome sequence and phastCons score:
gfeatures_expanded <- genomeDerivedFeatures(X,
transcriptdb=TxDb.Hsapiens.UCSC.hg19.knownGene,
sequence=BSgenome.Hsapiens.UCSC.hg19,
gscores=phastCons100way.UCSC.hg19,
clusteringY=X) #quantify clustering properties of x itself
str(gfeatures_expanded)
## Extract region properties features defined in Ensemble transcript annotation (EnsDb):
library(EnsDb.Hsapiens.v75)
gfeatures_ensdb <- genomeDerivedFeatures(X,
transcriptdb=EnsDb.Hsapiens.v75,
sequence=BSgenome.Hsapiens.UCSC.hg19,
gscores=phastCons100way.UCSC.hg19,
clusteringY=X,
annotBiotype=TRUE) #extract transcript biotypes in EnsDb
str(gfeatures_ensdb)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.