make_referFreq | R Documentation |
This function is used to calculate the frequencies of lncRNAs and CDs.
The Frequencies file can be used to calculate Logarithm-Distance (compute_LogDistance
),
Euclidean-Distance (compute_EucDistance
), and hexamer score (compute_hexamerScore
).
NOTE: If users need to make frequencies file to build
new LncFinder classifier using function extract_features
,
please refer to function make_frequencies
.
make_referFreq(
cds.seq,
lncRNA.seq,
k = 6,
step = 1,
alphabet = c("a", "c", "g", "t"),
on.orf = TRUE,
ignore.illegal = TRUE
)
cds.seq |
Coding sequences (mRNA without UTRs). Can be a FASTA file loaded
by |
lncRNA.seq |
Long non-coding RNA sequences. Can be a FASTA file loaded by
|
k |
An integer that indicates the sliding window size. (Default: |
step |
Integer defaulting to |
alphabet |
A vector of single characters that specify the different character
of the sequence. (Default: |
on.orf |
Logical. Incomplete CDs can lead to a false shift and a
inaccurate hexamer frequencies. When |
ignore.illegal |
Logical. If |
This function is used to make frequencies file for the computation of
Logarithm-Distance (compute_LogDistance
), Euclidean-Distance
(compute_EucDistance
),
and hexamer score (compute_hexamerScore
).
In order to achieve high accuracy, mRNA should not be regarded as CDs and assigned
to parameter cds.seq
. However, CDs of some species may be insufficient
for calculating frequencies. In that case, mRNAs can be regarded as CDs with parameter
on.orf = TRUE
, and the frequencies will be calculated on ORF region.
If on.orf = TRUE
, users can set step = 3
to simulate the translation process.
Returns a list which consists the frequencies of protein-coding sequences and non-coding sequences.
Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.
HAN Siyu
make_frequencies
,
compute_LogDistance
,
compute_EucDistance
,
compute_hexamerScore
.
## Not run:
Seqs <- seqinr::read.fasta(file =
"http://www.ncbi.nlm.nih.gov/WebSub/html/help/sample_files/nucleotide-sample.txt")
referFreq <- make_referFreq(cds.seq = Seqs, lncRNA.seq = Seqs, k = 6, step = 1,
alphabet = c("a", "c", "g", "t"), on.orf = TRUE,
ignore.illegal = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.