featurePSSM: Feature Coding

Description Usage Arguments Details Author(s) Examples

View source: R/feature.R

Description

A set of functions for extract features from biological sequences, and coding features by numeric vector.

Usage

1
  featurePSSM(seq, start.pos, stop.pos, psiblast.path, database.path)  

Arguments

seq

a string vector for the protein, DNA, or RNA sequences.

start.pos

a integer vector denoting the start position of the fragment window.

stop.pos

a integer vector denoting the stop position of the fragment window.

psiblast.path

a string for the path of blastpgp program. blastpgp will be employed to do PSI-BLAST and get Position-Specific Scoring Matrix.

database.path

a string for the path of a formated reference database. Database can be formated by "formatdb" program.

Details

featurePSSM returns a matrix with 20*N+N columns. Each row represented features of one sequence coding by a 20*N+N dimension numeric vector generated by PSI-BLAST. It contains two kinds of fatures: normalized position-specific score of PSSM (Position-Specific Scoring Matrix), Shannon entropy for each position of WOP (weighted observed percentages). Program PSI-BLAST and formatted NCBI non-redundant protein database are needed.

Author(s)

Hong Li

Examples

1
2
3
4
5
6
7
8
if(interactive()){
  file = file.path(path.package("BioSeqClass"), "example", "acetylation_K.fasta")  
  tmp = readAAStringSet(file) 
  proteinSeq = as.character(tmp)
   
  ## Need "blastpgp" program and a formated database. Database can be formated by "formatdb" program.
  PSSM1 = featurePSSM(proteinSeq[1:2], start.pos=rep(1,2), stop.pos=rep(10,2), psiblast.path="blastpgp", database.path="./result1.fasta")  
}

Example output

Loading required package: scatterplot3d

BioSeqClass documentation built on April 28, 2020, 9:19 p.m.