Description Usage Arguments Details Value See Also Examples
View source: R/predictors_annot_old.R
predictors_annot_old
is used to generate features given a SummarizedExperiment
object of RNA modification / target.
1 2 3 4 |
se |
A |
txdb |
|
bsgnm |
|
fc, pc |
Optional; Gulko B, Melissa J. Hubisz, Gronau I and Siepel A (2015). <e2><80><9c>Probabilities of fitness consequences for point mutations across the human genome.<e2><80><9d> Nature Genetics, 47, pp. 276-283. Siepel A and al. e (2005). <e2><80><9c>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.<e2><80><9d> Genome Research, 15, pp. 1034-1050. |
struct_hybridize |
Optional; A The precomputed MEA 2ndary structures could be find at the data attached in this package: |
feature_lst |
Optional; A list of |
motif |
A character vector indicating the motifs centered by the modification nucleotite, the motif will not be attached if the By default, the motif selected is RRACH: c("AAACA","GAACA","AGACA","GGACA","AAACT","GAACT","AGACT","GGACT","AAACC","GAACC","AGACC","GGACC"). |
HK_genes_list |
Optional; A character string of the Gene IDs of the House Keeping genes. The Gene IDs should correspond to the Gene IDs used by the provided The entrez gene IDs of the house keeping genes of mm10 and hg19 are included in this package: |
This function retreave transcript related features that are previous known to be related with m6A modifications based on
provided rowRanges
of the SummarizedExperiment
,
and it return features in forms of meta data collums of the SummarizedExperiment
.
The features that must be included:
###1. Transcript regions ### —- The entries are logical / dummy variables.
- UTR5: 5'UTR.
- UTR3: 3'UTR.
- CDS: Coding Sequence.
- Stop_codons: Stop codon (301 bp center).
- Start_codons: Start codon (201 bp center).
- m6Am: 5'Cap m6Am (TSS that has underlying sequence of A).
- Exons: Exonic regions.
- Last_exons_50bp: Start 50bp of the last exon of a transcript.
###2. Relative positions ### —- The entries fall into the scale of [0,1]. If the site is not mapped to any range on the right, the value is set to 0. (can be viewed as an interactive term on top of the region model.)
- Pos_UTR5: Relative positioning on 5'UTR.
- Pos_UTR3: Relative positioning on 3'UTR.
- Pos_CDS: Relative positioning on Coding Sequence.
- Pos_Tx: Relative positioning on Transcript.
- Pos_exons: Relative positioning on exons.
###3. Region length ###
- long_UTR3: Long 3'UTR (length > 400bp).
- long_exon: Long exon (length > 400bp).
- Gene_length_ex: standardized gene length of exonic regions (z score).
- Gene_length_all: standardized gene length of all regions (z score).
#####=============== The following features that are optional ===============#####
###4. Motif ###
by default it includes the following motifs search c("AAACA","GAACA","AGACA","GGACA","AAACT","GAACT","AGACT","GGACT","AAACC","GAACC","AGACC","GGACC"): i.e. instances of RRACH.
###5. Evolutionary fitness ###
- PC 1nt: standardized PC score 1 nt.
- PC 201nt: standardized PC score 101 nt.
- FC 1nt: standardized Fitness consequences scores 1nt.
- FC 5nt: standardized Fitness consequences scores 101nt.
###6. User specified features by argument feature_lst
###
The entries are logical / dummy variables, specifying whether overlapping with each GRanges or GRanges list.
###7.Gene attribute ###
- sncRNA: small noncoding RNA (<= 200bp)
- lncRNA: long noncoding RNA (> 200bp)
- Isoform_num: Transcript isoform numbers standardized by z score.
- HK_genes: mapped to house keeping genes, such as defined by paper below.
Eisenberg E, Levanon EY (October 2013). "Human housekeeping genes, revisited". Trends in Genetics. 29
###7.Batch effect ###
- GC_cont_genes: GC content of each gene.
- GC_cont_101bp: GC content of 101bp local region of the sites.
This function will return a SummarizedExperiment
object with a mcols
of a feature or design matrix.
logistic.modeling
to perform model selection, statistics calculation, and visualization across multiple samples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ### ==== For hg19 ==== ###
library(SummarizedExperiment)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(BSgenome.Hsapiens.UCSC.hg19)
library(fitCons.UCSC.hg19)
library(phastCons100way.UCSC.hg19)
Feature_List_hg19 = list(
HNRNPC_eCLIP = eCLIP_HNRNPC_gr,
YTHDC1_TREW = YTHDC1_TREW_gr,
YTHDF1_TREW = YTHDF1_TREW_gr,
YTHDF2_TREW = YTHDF2_TREW_gr,
miR_targeted_genes = miR_targeted_genes_grl,
#miRanda = miRanda_hg19_gr,
TargetScan = TargetScan_hg19_gr,
Verified_miRtargets = verified_targets_gr
)
SE_features_added <- predictors_annot_old(se = SE_example,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
bsgnm = Hsapiens,
fc = fitCons.UCSC.hg19,
pc = phastCons100way.UCSC.hg19,
struct_hybridize = Struc_hg19,
feature_lst = Feature_List_hg19,
HK_genes_list = HK_hg19_eids)
mcols(SE_features_added) ###Check the generated feature matrix.
#ToDo1 : add argument Reduce_GenomicFeature_Colinearity.
#ToDo2: add argument Reduce_GenomicResponse_Dependency.
#ToDo3: the sample_names_coldata is very very confusing.
#ToDo4: must support the input format of matrix and TRUE/FALSE for logistic regression.
#ToDo5: Response could be ordinary, binomial, and poisson.
#Fetures need to change into....
1. change fc and pc into z scores.
2. change last exon 50 bp into last exon relative position centered at 0.
3. transcript that stop codon falls in the last exons.
3. add last exon dummy.
4. add relative exonic rank 0-1.
5. add introns.
6. add relative intronic positions.
7. add relative intronic rank 0-1.
8. add splicing junction 5' 50bp exons
9. add splicing junction 3' 50bp exons
10. add splicing junction 5' 50bp introns.
11. add splicing junction 3' 50bp introns.
12. add all relative positions in MAD standardized absolute bp 5' end, absolute bp 3' end.
add another 30 features.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.