Description Usage Arguments Details Value See Also Examples
View source: R/predictors_annot.R
predictors_annot
is used to generate features given a SummarizedExperiment
object of RNA modification / target.
1 2 3 4 5 6 7 8 | predictors_annot(se, txdb, bsgnm, fc = NULL, pc = NULL,
struct_hybridize = NULL, feature_lst = NULL, motif = c("AAACA",
"GAACA", "AGACA", "GGACA", "AAACT", "GAACT", "AGACT", "GGACT", "AAACC",
"GAACC", "AGACC", "GGACC"), motif_clustering = "DRACH",
annot_clustering = NULL, hk_genes_list = NULL,
isoform_ambiguity_method = c("longest_tx", "average"),
genes_ambiguity_method = c("drop_overlap", "average"),
standardization = TRUE)
|
se |
A |
txdb |
|
bsgnm |
|
fc, pc |
Optional; Gulko B, Melissa J. Hubisz, Gronau I and Siepel A (2015). <e2><80><9c>Probabilities of fitness consequences for point mutations across the human genome.<e2><80><9d> Nature Genetics, 47, pp. 276-283. Siepel A and al. e (2005). <e2><80><9c>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.<e2><80><9d> Genome Research, 15, pp. 1034-1050. |
struct_hybridize |
Optional; A The precomputed MEA 2ndary structures could be find at the data attached in this package: |
feature_lst |
Optional; A list of |
motif |
A character vector indicating the motifs centered by the modification nucleotite, the motif will not be attached if the By default, the motif selected is RRACH: c("AAACA","GAACA","AGACA","GGACA","AAACT","GAACT","AGACT","GGACT","AAACC","GAACC","AGACC","GGACC"). |
motif_clustering |
A character vector indicating the motif used to generate the features for the clustering indexes, Default: "DRACH". |
annot_clustering |
A The resulting clustering features will be named |
hk_genes_list |
Optional; A character string of the Gene IDs of the House Keeping genes. The Gene IDs should correspond to the Gene IDs used by the provided The entrez gene IDs of the house keeping genes of mm10 and hg19 are included in this package: |
isoform_ambiguity_method |
Can be "longest_tx" or "average". The former keeps only the longest transcript as the transcript annotation. The later will use the average feature entries for multiple mapping of the transcript isoform. |
genes_ambiguity_method |
Can be "drop_overlap" or "average". The former will not annotate the modification sites overlapped with > 1 genes (By returning NA). The later will use the average feature entries for mapping of multiple genes. |
standardization |
A logical indicating whether to standardize the continous features; Default TRUE. |
This function retreave transcript related features that are previous known to be related with m6A modifications based on
provided rowRanges
of the SummarizedExperiment
,
and it return features in forms of meta data collums of the SummarizedExperiment
.
The features that must be included:
###1. Transcript regions ### —- The entries are logical / dummy variables.
- UTR5: 5'UTR.
- UTR3: 3'UTR.
- cds: Coding Sequence.
- Stop_codons: Stop codon (301 bp center).
- Start_codons: Start codon (201 bp center).
- m6Am: 5'Cap m6Am (TSS that has underlying sequence of A).
- Exons: Exonic regions.
- last_exons_50bp: Start 50bp of the last exon of a transcript.
###2. Relative positions ### —- The entries fall into the scale of [0,1]. If the site is not mapped to any range on the right, the value is set to 0. (can be viewed as an interactive term on top of the region model.)
- pos_UTR5: Relative positioning on 5'UTR.
- pos_UTR3: Relative positioning on 3'UTR.
- pos_cds: Relative positioning on Coding Sequence.
- pos_Tx: Relative positioning on Transcript.
- pos_exons: Relative positioning on exons.
###3. Region length ###
- long_UTR3: Long 3'UTR (length > 400bp).
- long_exon: Long exon (length > 400bp).
- Gene_length_ex: standardized gene length of exonic regions (z score).
- Gene_length_all: standardized gene length of all regions (z score).
#####=============== The following features that are optional ===============#####
###4. Motif ###
by default it includes the following motifs search c("AAACA","GAACA","AGACA","GGACA","AAACT","GAACT","AGACT","GGACT","AAACC","GAACC","AGACC","GGACC"): i.e. instances of RRACH.
###5. Evolutionary fitness ###
- PC 1bp: standardized PC score 1 nt.
- PC 201bp: standardized PC score 101 nt.
- FC 1bp: standardized Fitness consequences scores 1bp.
- FC 5nt: standardized Fitness consequences scores 101bp.
###6. User specified features by argument feature_lst
###
The entries are logical / dummy variables, specifying whether overlapping with each GRanges or GRanges list.
###7.Gene attribute ###
- sncRNA: small noncoding RNA (<= 200bp)
- lncRNA: long noncoding RNA (> 200bp)
- Isoform_num: Transcript isoform numbers standardized by z score.
- HK_genes: mapped to house keeping genes, such as defined by paper below.
Eisenberg E, Levanon EY (October 2013). "Human housekeeping genes, revisited". Trends in Genetics. 29
###7.Batch effect ###
- GC_cont_genes: GC content of each gene.
- GC_cont_101bp: GC content of 101bp local region of the sites.
This function will return a SummarizedExperiment
object with a mcols
of a feature or design matrix.
glm_bas
, glm_multinomial
, glm_regular
to perform model selection, statistics calculation, and visualization across multiple samples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ### ==== For hg19 ==== ###
library(SummarizedExperiment)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(BSgenome.Hsapiens.UCSC.hg19)
library(fitCons.UCSC.hg19)
library(phastCons100way.UCSC.hg19)
Feature_List_hg19 = list(
HNRNPC_eCLIP = eCLIP_HNRNPC_gr,
YTHDC1_TREW = YTHDC1_TREW_gr,
YTHDF1_TREW = YTHDF1_TREW_gr,
YTHDF2_TREW = YTHDF2_TREW_gr,
miR_targeted_genes = miR_targeted_genes_grl,
#miRanda = miRanda_hg19_gr,
TargetScan = TargetScan_hg19_gr,
Verified_miRtargets = verified_targets_gr
)
SE_features_added <- predictors_annot(se = SummarizedExperiment(rowRanges = hg19_miCLIP_gr),
txdb = txdb,
bsgnm = Hsapiens,
fc = fitCons.UCSC.hg19,
pc = phastCons100way.UCSC.hg19,
struct_hybridize = Struc_hg19,
feature_lst = Additional_features_hg19,
hk_genes_list = HK_hg19_eids,
motif = c("AAACA","AGACA","AAACT","AGACT","AAACC","AGACC",
"GAACA","GGACA","GAACT","GGACT","GAACC","GGACC",
"TAACA","TGACA","TAACT","TGACT","TAACC","TGACC"),
motif_clustering = "DRACH",
standardization = F,
genes_ambiguity_method = "average")
mcols(SE_features_added) ###Check the generated feature matrix.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.