featureFreq | R Documentation |
Basically a wrapper for computeFreq
function.
This function can calculate the k-mer frequencies of RNA and protein sequences at the same time
and format the results as the dataset that can be used to build classifier.
featureFreq(
seqRNA,
seqPro,
label = NULL,
featureMode = c("concatenate", "combine"),
computePro = c("RPISeq", "DeNovo", "rpiCOOL"),
k.Pro = 3,
k.RNA = 4,
EDP = FALSE,
normalize = c("none", "row", "column"),
normData = NULL,
parallel.cores = 2,
cl = NULL
)
seqRNA |
RNA sequences loaded by function |
seqPro |
protein sequences loaded by function |
label |
optional. A string or a vector of strings or |
featureMode |
a string that can be |
computePro |
a string that specifies the computation mode of protein sequence: |
k.Pro |
an integer that indicates the sliding window step of RNA sequences. Default: |
k.RNA |
an integer that indicates the sliding window step of protein sequences. Default: |
EDP |
logical. If |
normalize |
can be |
normData |
is the normalization data generated by this function.
If the input dataset is training set, or normalize strategy is |
parallel.cores |
an integer specifying the number of cores for parallel computation. Default: |
cl |
parallel cores to be passed to this function. |
see computeFreq
.
If normalize = "none"
or normalize = "row"
, this function will return a data frame.
Row names are sequences names, and column names are polymer names.
The names of RNA and protein sequences are separated with ".",
i.e. row names format: "RNASequenceName.proteinSequenceName" (e.g. "YDL227C.YOR198C").
If featureMode = "combine"
, the polymers of RNA and protein sequences are also separated with ".",
i.e. column format: "RNAPolymerName.proteinPolymerName" (e.g. "aa.CCA").
If normalize = "column"
, the function will return a list containing features (a data frame named "feature") and normalization
values (a list named "normData") for extracting features for test sets.
[1] Han S, Yang X, Sun H, et al. LION: an integrated R package for effective prediction of ncRNA–protein interaction. Briefings in Bioinformatics. 2022; 23(6):bbac420
[2] Shen J, Zhang J, Luo X, et al. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. U. S. A. 2007; 104:4337-41
[3] Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics 2011; 12:489
[4] Wang Y, Chen X, Liu Z-P, et al. De novo prediction of RNA-protein interactions from sequence information. Mol. BioSyst. 2013; 9:133-142
[5] Akbaripour-Elahabad M, Zahiri J, Rafeh R, et al. rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest. J. Theor. Biol. 2016; 402:1-8
computeFreq
data(demoPositiveSeq)
seqsRNA <- demoPositiveSeq$RNA.positive
seqsPro <- demoPositiveSeq$Pro.positive
dataset1 <- featureFreq(seqRNA = seqsRNA, seqPro = seqsPro, label = "Interact",
featureMode = "comb", computePro = "DeNovo", k.Pro = 3,
k.RNA = 2, normalize = "row", parallel.cores = 2)
# Training set with normalization on column:
dataset2 <- featureFreq(seqRNA = seqsRNA, seqPro = seqsPro, featureMode = "conc",
computePro = "rpiCOOL", k.Pro = 3, k.RNA = 4,
normalize = "column", parallel.cores = 2)
# If build a test set with normalization on column,
# "normData" of the corresponding training set (generated by this function) is required:
dataset3 <- featureFreq(seqRNA = seqsRNA, seqPro = seqsPro, featureMode = "conc",
computePro = "rpiCOOL", k.Pro = 3, k.RNA = 4,
normalize = "column", normData = dataset2$normData,
parallel.cores = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.