featureMotifs: Extraction of the Motif Features of RNA and Protein Sequences

View source: R/Motifs.R

featureMotifsR Documentation

Extraction of the Motif Features of RNA and Protein Sequences

Description

Basically a wrapper for computeMotifs function. This function can count the motifs of RNA and protein sequences at the same time and format the results as the dataset that can be used to build classifier.

Usage

featureMotifs(
  seqRNA,
  seqPro,
  label = NULL,
  featureMode = c("concatenate", "combine"),
  newMotif.RNA = NULL,
  newMotif.Pro = NULL,
  newMotifOnly.RNA = FALSE,
  newMotifOnly.Pro = FALSE,
  parallel.cores = 2,
  cl = NULL,
  ...
)

Arguments

seqRNA

RNA sequences loaded by function read.fasta from seqinr-package. Or a list of RNA sequences. RNA sequences will be converted into lower case letters.

seqPro

protein sequences loaded by function read.fasta from seqinr-package. Or a list of protein sequences. Protein sequences will be converted into upper case letters.

label

optional. A string or a vector of strings or NULL. Indicates the class of the samples such as "Interact", "Non.Interact". Default: NULL.

featureMode

a string that can be "concatenate" or "combine". If "concatenate", the motif features of RNA and proteins will be simply concatenated. If "combine", the returned dataset will be formed by combining the motif features of RNA and proteins. See details below. Default: "concatenate".

newMotif.RNA

a list specifying the motifs that are counted in RNA sequences. Default: NULL. For example, newMotif = list(hnRNPA1 = c("UAGGGU", "UAGGGA"), SF1 = "UACUAAC"). Can be used with parameter motifRNA (see parameter ...) to count motifs in RNA sequences.

newMotif.Pro

a list specifying the motifs that are counted in protein sequences. Default: NULL. For example, newMotif = list(YGG = "YGG", E = "E"). Can be used with parameter motifPro (see parameter ...) to count motifs in protein sequences.

newMotifOnly.RNA

logical. If TRUE, only the new motifs defined in newMotif.RNA will be counted. Default: FALSE.

newMotifOnly.Pro

logical. If TRUE, only the new motifs defined in newMotif.Pro will be counted. Default: FALSE.

parallel.cores

an integer that indicates the number of cores for parallel computation. Default: 2. Set parallel.cores = -1 to run with all the cores. parallel.cores should be == -1 or >= 1.

cl

parallel cores to be passed to this function.

...

argument motifRNA and motifPro to be passed to computeMotifs. Used to compute the default motifs. See examples below.

Details

If featureMode = "concatenate", m RNA motif features will be simply concatenated with n protein motif features, and the final result has m + n features. If featureMode = "combine", m RNA motif features will be combined with n protein motif features, resulting in m * n possible combinations.

... can be used to pass the default motif patterns of RNA and protein sequences. See arguments motifRNA and motifPro in computeMotifs.

Value

This function returns a data frame. Row names are the sequences names, and column names are the motif names. The names of RNA and protein sequences are separated with ".", i.e. row names format: "RNASequenceName.proteinSequenceName" (e.g. "YDL227C.YOR198C"). If featureMode = "combine", the motif names of RNA and protein sequences are also separated with ".", i.e. column names format: "motif_RNAMotifName.motif_proteinMotifName" (e.g. "motif_PUM.motif_EE").

References

[1] Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420

[2] Akbaripour-Elahabad M, Zahiri J, Rafeh R, et al. rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest. J. Theor. Biol. 2016; 402:1-8

[3] Pancaldi V, Bahler J. In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Res. 2011; 39:5826-36

[4] Castello A, Fischer B, Eichelbaum K, et al. Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins. Cell 2012; 149:1393-1406

[5] Ray D, Kazan H, Cook KB, et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 2013; 499:172-177

[6] Jiang P, Singh M, Coller HA. Computational assessment of the cooperativity between RNA binding proteins and MicroRNAs in Transcript Decay. PLoS Comput. Biol. 2013; 9:e1003075

See Also

computeMotifs

Examples

data(demoPositiveSeq)
seqsRNA <- demoPositiveSeq$RNA.positive
seqsPro <- demoPositiveSeq$Pro.positive

dataset1 <- featureMotifs(seqRNA = seqsRNA, seqPro = seqsPro, featureMode = "conc",
                          newMotif.RNA = list(motif1 = c("cc", "cu")),
                          newMotif.Pro = list(motif2 = "KK"),
                          motifRNA = c("Fusip1", "AU", "UG"),
                          motifPro = c("E", "K", "HR_RH"))

dataset2 <- featureMotifs(seqRNA = seqsRNA, seqPro = seqsPro, featureMode = "comb",
                          newMotif.RNA = list(motif1 = c("cc", "cu")),
                          newMotif.Pro = list(motif2 = c("R", "H")),
                          newMotifOnly.RNA = TRUE, newMotifOnly.Pro = FALSE)


HAN-Siyu/ncProR documentation built on Nov. 3, 2023, 12:08 a.m.