computeMotifs | R Documentation |
Counts the number of motifs occurring in RNA/protein sequences. Motifs employed by tool "rpiCOOL" can be selected. New motifs can also be defined.
computeMotifs(
seqs,
seqType = c("RNA", "Pro"),
motifRNA = c("rpiCOOL", "Fox1", "Nova", "Slm2", "Fusip1", "PTB", "ARE", "hnRNPA1",
"PUM", "U1A", "HuD", "QKI", "U2B", "SF1", "HuR", "YB1", "AU", "UG", "selected5"),
motifPro = c("rpiCOOL", "E", "H", "K", "R", "H_R", "EE", "KK", "HR_RH", "RS_SR", "RGG",
"YGG"),
newMotif = NULL,
newMotifOnly = FALSE,
parallel.cores = 2,
cl = NULL
)
seqs |
sequences loaded by function |
seqType |
a string that specifies the nature of the sequence: |
motifRNA |
strings specifying the motifs that are counted in RNA sequences. Ignored if |
motifPro |
strings specifying the motifs that are counted in protein sequences. Ignored if |
newMotif |
list defining new motifs not listed above. New motifs are counted in RNA or protein sequences.
For example, |
newMotifOnly |
logical. If |
parallel.cores |
an integer specifying the number of cores for parallel computation. Default: |
cl |
parallel cores to be passed to this function. |
This function can count the motifs in RNA or protein sequences.
The default motifs are selected or derived from tool "rpiCOOL" (Ref: [2]).
Motifs of RNA
Fox1: UGCAUGU;
Nova: UCAUUUCAC, UCAUUUCAU, CCAUUUCAC, CCAUUUCAU;
Slm2: UAAAC, UAAAA, UAAUC, UAAUA;
Fusip1: AAAGA, AAAGG, AGAGA, AGAGG, CAAGA, CAAGG, CGAGA, CGAGG;
PTB: UUUUU, UUUCU, UCUUU, UCUCU;
ARE: UAUUUAUU;
hnRNPA1: UAGGGU, UAGGGA;
PUM: UGUAAAUA, UGUAGAUA, UGUAUAUA, UGUACAUA;
U1A: AUUGCAC;
HuD: UUAUUU;
QKI: AUUAAU, AUUAAC, ACUAAU, ACUAAC;
U2B: AUUGCAG;
SF1: UACUAAC;
HuR: UUUAUUU, UUUGUUU, UUUCUUU, UUUUUUU;
YB1: CCUGCG, UCUGCG;
AU: AU;
UG: UG.
If "rpiCOOL"
, all default motifs will be counted, and there is no need to input other default motifs.
"selected5"
indicates the total number of the occurrences of: PUM, Fox-1, U1A, Nova, and ARE which
are regarded as the five most over-represented binding motifs.
Motifs of protein
E: E;
H: H;
K: K;
R: R;
EE: EE;
KK: KK;
HR ("H_R"
): H, R;
HR ("HR_RH"
): HR, RH;
RS ("RS_SR"
): RS, SR;
RGG: RGG;
YGG: YGG.
If "rpiCOOL"
, default motifs of rpiCOOL ("E"
, "K"
, "H_R"
,
"EE"
, "KK"
, "RS_SR"
, "RGG"
, and "YGG"
) will be counted.
There are some minor differences between this function and the extraction scheme of rpiCOOL.
In this function, motifs will be scanned directly.
As to the extraction scheme of rpiCOOL, some motifs ("UG"
, "AU"
, and "H_R"
)
are scanned in a 10 nt/aa sliding-window.
New motif patterns are also supported. Users can pass new patterns to argument "newMotif" as a list. Format:
newMotif = list(*motif_name* = c("*motif_pattern_1*", "*motif_pattern_2*"))
.
For example: newMotif = list(HR_RH = c("HR", "RH"), RGG = "RGG")
.
"HR_RH" is the name of this motif which contains two patterns: "HR" and "RH".
This function returns a data frame. Row names are sequences names, and column names are motif names.
[1] Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420
[2] Akbaripour-Elahabad M, Zahiri J, Rafeh R, et al. rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest. J. Theor. Biol. 2016; 402:1-8
[3] Pancaldi V, Bahler J. In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Res. 2011; 39:5826-36
[4] Castello A, Fischer B, Eichelbaum K, et al. Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins. Cell 2012; 149:1393-1406
[5] Ray D, Kazan H, Cook KB, et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 2013; 499:172-177
[6] Jiang P, Singh M, Coller HA. Computational assessment of the cooperativity between RNA binding proteins and MicroRNAs in Transcript Decay. PLoS Comput. Biol. 2013; 9:e1003075
featureMotifs
data(demoPositiveSeq)
seqsRNA <- demoPositiveSeq$RNA.positive
seqsPro <- demoPositiveSeq$Pro.positive
motifRNA1 <- computeMotifs(seqsRNA, seqType = "RNA", motifRNA = "rpiCOOL",
parallel.cores = 2)
motifRNA2 <- computeMotifs(seqsRNA, seqType = "RNA",
motifRNA = c("Fox1", "HuR", "ARE"), parallel.cores = 2)
motifPro1 <- computeMotifs(seqsPro, seqType = "Pro",
motifPro = c("rpiCOOL", "HR_RH"), parallel.cores = 2)
# Customized motifs are also supported and can be extracted with default motifs.
# Pass new motif patterns to "newMotif" argument as a list:
motifPro2 <- computeMotifs(seqsPro, seqType = "Pro", motifPro = c("E", "K", "KK"),
newMotif = list(HR_RH = c("HR", "RH"), RGG = "RGG"),
parallel.cores = 2)
motifPro3 <- computeMotifs(seqsPro, seqType = "Pro", motifPro = c("rpiCOOL"),
newMotif = list(HR_RH = c("HR", "RH"), RGG = "RGG"),
parallel.cores = 2)
# set "newMotifOnly = TRUE", if compute customized motifs only:
motifPro4 <- computeMotifs(seqsPro, seqType = "Pro",
newMotif = list(HR_RH = c("HR", "RH"), RGG = "RGG"),
newMotifOnly = TRUE, parallel.cores = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.