compute_kmer: Compute _k_-mer Features

Description Usage Arguments Details Value Author(s) Examples

View source: R/LncFinder.R

Description

This function can calculate the k-mer frequencies of the sequences.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
compute_kmer(
  Sequences,
  label = NULL,
  k = 1:5,
  step = 1,
  freq = TRUE,
  improved.mode = FALSE,
  alphabet = c("a", "c", "g", "t"),
  on.ORF = FALSE,
  auto.full = FALSE,
  parallel.cores = 2
)

Arguments

Sequences

A FASTA file loaded by function read.fasta of seqinr-package.

label

Optional. String. Indicate the label of the sequences such as "NonCoding", "Coding".

k

An integer that indicates the sliding window size. (Default: 1:5)

step

Integer defaulting to 1 for the window step.

freq

Logical. If TRUE, the frequencies of different patterns are returned instead of counts. (Default: TRUE)

improved.mode

Logical. If TRUE, the frequencies will be normalized using the method proposed by PLEK (Li et al. 2014). Ignored if freq = FALSE. (Default: FALSE)

alphabet

A vector of single characters that specify the different character of the sequence. (Default: alphabet = c("a", "c", "g", "t"))

on.ORF

Logical. If TRUE, the k-mer frequencies will be calculated on the longest ORF region. NOTE: If TRUE, the sequences have to be DNA. (Default: FALSE)

auto.full

Logical. When on.ORF = TRUE but no ORF can be found, if auto.full = TRUE, the k-mer frequencies will be calculated on the full sequence automatically; if auto.full is FALSE, the sequences that have no ORF will be discarded. Ignored when on.ORF = FALSE. (Default: FALSE)

parallel.cores

Integer. The number of cores for parallel computation. By default the number of cores is 2. Users can set as -1 to run this function with all cores.

Details

This function can extract k-mer features. k and step can be customized. The count (freq = FALSE) or frequencies (freq = TRUE) of different patterns can be returned. If freq = TRUE, improved.mode is available. The improved mode is proposed by method PLEK. (Ref: Li et al. 2014)

Value

A dataframe.

Author(s)

HAN Siyu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
data(demo_DNA.seq)
Seqs <- demo_DNA.seq

kmer_res1 <- compute_kmer(Seqs, k = 1:5, step = 1, freq = TRUE, improved.mode = FALSE)

kmer_res2 <- compute_kmer(Seqs, k = 1:5, step = 3, freq = TRUE,
                          improved.mode = TRUE, on.ORF = TRUE, auto.full = TRUE)

## End(Not run)

LncFinder documentation built on Dec. 11, 2021, 9:39 a.m.