PSTNPss_DNA: Position-Specific Trinucleotide Propensity based on...

Description Usage Arguments Value Note Examples

View source: R/PSTNPss_DNA.R

Description

The inputs to this function are positive and negative data sets and a set of sequences. The output of the function is a matrix of feature vectors. The number of rows of the output matrix is equal to the number of sequences. The feature vector for an input sequence with length L is [u(1),u(2),...u(L-2)]. For each input sequence, u(1) is calculated by subtracting the frequency of sequences (which start with the same trinucleotides as the input sequence) in the positive set with those starting with the same trinucleotide in the negative set. We compute u(i) like u(1) with the exception that instead of the first trinucleotide, the ith trinucletide is considered.

Usage

1
PSTNPss_DNA(seqs, pos, neg, label = c())

Arguments

seqs

is a FASTA file containing nucleotide sequences. The sequences start with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence.

pos

is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector. The sequences are positive sequences in the training model.

neg

is a fasta file containing nucleotide sequences. Each sequence starts with '>'. Also, the value of this parameter can be a string vector.

label

is an optional parameter. It is a vector whose length is equal to the number of sequences. It shows the class of each entry (i.e., sequence).

Value

It returns a feature matrix. The number of columns is equal to the length of sequences minus two and the number of rows is equal to the number of sequences.

Note

The length of the sequences in positive and negative data sets and the input sets should be equal.

Examples

1
2
3
4
5
6
7
8
ptmSeqsADR<-system.file("extdata/",package="ftrCOOL")

posSeqs<-fa.read(file=paste0(ptmSeqsADR,"/posDNA.txt"),alphabet="dna")
negSeqs<-fa.read(file=paste0(ptmSeqsADR,"/negDNA.txt"),alphabet="dna")
seqs<-fa.read(file=paste0(ptmSeqsADR,"/DNA_testing.txt"),alphabet="dna")


mat=PSTNPss_DNA(seqs=seqs,pos=posSeqs,neg=negSeqs)

ftrCOOL documentation built on Nov. 30, 2021, 1:07 a.m.

Related to PSTNPss_DNA in ftrCOOL...