compute_EucDistance: Compute Euclidean Distance

Description Usage Arguments Details Value References Author(s) See Also Examples

View source: R/LncFinder.R

Description

This function can compute Euclidean Distance proposed by method LncFinder (Han et al. 2018). Euclidean Distance can be calculated on full sequence or the longest ORF region. The step and k of the sliding window can also be customized.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
compute_EucDistance(
  Sequences,
  label = NULL,
  referFreq,
  k = 6,
  step = 1,
  alphabet = c("a", "c", "g", "t"),
  on.ORF = FALSE,
  auto.full = FALSE,
  parallel.cores = 2
)

Arguments

Sequences

A FASTA file loaded by function read.fasta of seqinr-package.

label

Optional. String. Indicate the label of the sequences such as "NonCoding", "Coding".

referFreq

a list obtained from function make_referFreq.

k

An integer that indicates the sliding window size. (Default: 6)

step

Integer defaulting to 1 for the window step.

alphabet

A vector of single characters that specify the different character of the sequence. (Default: alphabet = c("a", "c", "g", "t"))

on.ORF

Logical. If TRUE, Euclidean Distance will be calculated on the longest ORF region. NOTE: If TRUE, the input has to be DNA sequences. (Default: FALSE)

auto.full

Logical. When on.ORF = TRUE but no ORF can be found, if auto.full = TRUE, Euclidean Distance will be calculated on full sequences automatically; if auto.full is FALSE, the sequences that have no ORF will be discarded. Ignored when on.ORF = FALSE. (Default: FALSE)

parallel.cores

Integer. The number of cores for parallel computation. By default the number of cores is 2. Users can set as -1 to run this function with all cores.

Details

This function can compute Euclidean Distance proposed by LncFinder (HAN et al. 2018). In LncFinder, two schemes are provided to calculate Euclidean Distance: 1) step = 3 and k = 6 on the longest ORF region; 2) step = 1 and k = 6 on full sequence. Using this function compute_EucDistance, both step, k, and calculated region (full sequence or ORF) can be customized to maximize its availability.

Value

A dataframe.

References

Siyu Han, Yanchun Liang, Qin Ma, Yangyi Xu, Yu Zhang, Wei Du, Cankun Wang & Ying Li. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information, and physicochemical property. Briefings in Bioinformatics, 2019, 20(6):2009-2027.

Author(s)

HAN Siyu

See Also

make_referFreq, compute_LogDistance, compute_hexamerScore.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
Seqs <- seqinr::read.fasta(file =
"http://www.ncbi.nlm.nih.gov/WebSub/html/help/sample_files/nucleotide-sample.txt")

referFreq <- make_referFreq(cds.seq = Seqs, lncRNA.seq = Seqs, k = 6, step = 3,
                            alphabet = c("a", "c", "g", "t"), on.orf = TRUE,
                            ignore.illegal = TRUE)

data(demo_DNA.seq)
Sequences <- demo_DNA.seq

EucDistance <- compute_EucDistance(Sequences, label = "NonCoding", referFreq = referFreq,
                                   k = 6, step = 3, alphabet = c("a", "c", "g", "t"),
                                   on.ORF = TRUE, auto.full = TRUE, parallel.cores = 2)

## End(Not run)

LncFinder documentation built on Dec. 11, 2021, 9:39 a.m.