View source: R/421-extractProtPSSM.R
extractProtPSSM | R Documentation |
Compute PSSM (Position-Specific Scoring Matrix) for given protein sequence
extractProtPSSM(
seq,
start.pos = 1L,
end.pos = nchar(seq),
psiblast.path = NULL,
makeblastdb.path = NULL,
database.path = NULL,
iter = 5,
silent = TRUE,
evalue = 10L,
word.size = NULL,
gapopen = NULL,
gapextend = NULL,
matrix = "BLOSUM62",
threshold = NULL,
seg = "no",
soft.masking = FALSE,
culling.limit = NULL,
best.hit.overhang = NULL,
best.hit.score.edge = NULL,
xdrop.ungap = NULL,
xdrop.gap = NULL,
xdrop.gap.final = NULL,
window.size = NULL,
gap.trigger = 22L,
num.threads = 1L,
pseudocount = 0L,
inclusion.ethresh = 0.002
)
seq |
Character vector, as the input protein sequence. |
start.pos |
Optional integer denoting the start position of the
fragment window. Default is |
end.pos |
Optional integer denoting the end position of the
fragment window. Default is |
psiblast.path |
Character string indicating the path of the
|
makeblastdb.path |
Character string indicating the path of the
|
database.path |
Character string indicating the path of a reference database (a FASTA file). |
iter |
Number of iterations to perform for PSI-Blast. |
silent |
Logical. Whether the PSI-Blast running output
should be shown or not (May not work on some Windows versions and
PSI-Blast versions), default is |
evalue |
Expectation value (E) threshold for saving hits.
Default is |
word.size |
Word size for wordfinder algorithm. An integer >= 2. |
gapopen |
Integer. Cost to open a gap. |
gapextend |
Integer. Cost to extend a gap. |
matrix |
Character string. The scoring matrix name
(default is |
threshold |
Minimum word score such that the word is added to the BLAST lookup table. A real value >= 0. |
seg |
Character string. Filter query sequence with SEG ( |
soft.masking |
Logical. Apply filtering locations as soft masks?
Default is |
culling.limit |
An integer >= 0. If the query range of a hit is
enveloped by that of at least this many higher-scoring hits,
delete the hit. Incompatible with |
best.hit.overhang |
Best Hit algorithm overhang value
(A real value >= 0 and =< 0.5, recommended value: 0.1).
Incompatible with |
best.hit.score.edge |
Best Hit algorithm score edge value
(A real value >=0 and =< 0.5, recommended value: 0.1).
Incompatible with |
xdrop.ungap |
X-dropoff value (in bits) for ungapped extensions. |
xdrop.gap |
X-dropoff value (in bits) for preliminary gapped extensions. |
xdrop.gap.final |
X-dropoff value (in bits) for final gapped alignment. |
window.size |
An integer >= 0. Multiple hits window size,
To specify 1-hit algorithm, use |
gap.trigger |
Number of bits to trigger gapping. Default is |
num.threads |
Integer. Number of threads (CPUs) to use in the
BLAST search. Default is |
pseudocount |
Integer. Pseudo-count value used when constructing PSSM.
Default is |
inclusion.ethresh |
E-value inclusion threshold for pairwise alignments.
Default is |
This function calculates the PSSM (Position-Specific Scoring Matrix) derived by PSI-Blast for given protein sequence or peptides. For given protein sequences or peptides, PSSM represents the log-likelihood of the substitution of the 20 types of amino acids at that position in the sequence. Note that the output value is not normalized.
The original PSSM, a numeric matrix which has
end.pos - start.pos + 1
columns and 20
named rows.
The function requires the makeblastdb
and psiblast
programs
to be properly installed in the operation system or
their paths provided.
The two command-line programs are included in the NCBI-BLAST+ software package. To install NCBI Blast+, just open the NCBI FTP site using web browser or FTP software: ftp://anonymous@ftp.ncbi.nlm.nih.gov:21/blast/executables/blast+/LATEST/ then download the executable version of BLAST+ according to your operation system, and compile or install the downloaded source code or executable program.
Ubuntu/Debian users can directly use the command
sudo apt-get install ncbi-blast+
to install NCBI Blast+.
For OS X users, download ncbi-blast- ... .dmg
then install.
For Windows users, download ncbi-blast- ... .exe
then install.
Altschul, Stephen F., et al. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic acids research 25.17 (1997): 3389–3402.
Ye, Xugang, Guoli Wang, and Stephen F. Altschul. "An assessment of substitution scores for protein profile-profile comparison." Bioinformatics 27.24 (2011): 3356–3363.
Rangwala, Huzefa, and George Karypis. "Profile-based direct kernels for remote homology detection and fold recognition." Bioinformatics 21.23 (2005): 4239–4247.
extractProtPSSMFeature extractProtPSSMAcc
x = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]]
dbpath = tempfile('tempdb', fileext = '.fasta')
invisible(file.copy(from = system.file('protseq/Plasminogen.fasta', package = 'Rcpi'), to = dbpath))
pssmmat = extractProtPSSM(seq = x, database.path = dbpath)
dim(pssmmat) # 20 x 562 (P00750: length 562, 20 Amino Acids)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.