Description Objects from the Class Arguments Details Slots Extends Author(s) See Also Examples

The descriptors class is an extension to the `data.frame`

class and
contains, in addition to the descriptors, information about any response
data and p-values which describe the difference between the sequences
vs. the space of possible sequences. The class should be created by a call to `descriptors`

(see
arguments and details below) or `simpleDescriptors`

.

Objects can be created by calls of the form `descriptors(seqs, response=numeric(0), base.frame=NA, do.var=TRUE, alags=c(1,2,3), do.mean=TRUE, do.counts=TRUE, do.position=TRUE, alphabet=seqs@alphabet, include.statistics=TRUE, accuracy=0.01)`

- seqs
A

`Sequences object`

- response
An optional array containing responses for each sequence.

`nrow(seqs)`

should be equal to`length(response)`

.- base.frame
A

`data.frame`

containing descriptors calculated on each amino acid. See details.- do.var
Calculate the additional descriptors which are the variance of single residue descriptors along the sequence

- do.mean
Calculate the mean of the single residue descriptors along the sequence

- do.counts
Provide descriptors of various counts, like the number of each residue type

- do.position
Provide position specific descriptors

- alphabet
The alphabet to use for calculating counts.

- include.statistics
If

`TRUE`

, the function will calculate the p-values of the descriptors. See details.- accuracy
The accuracy of the computed statistics on the descriptors

The descriptor calculation methods used here are not as sophisticated
as those provided in some of the more complete QSAR packages. Instead,
it relies on making various permutations of descriptors calculated on
single amino acids. There are two reasons for this. First, it is easy
to calculate descriptors quickly, without relying on another
program. Second, it is easier to treat calculating the distribution of
the descriptors of the sequence space. The ability to calculate the
descriptors across the sequence space also depends on the number of
descriptors and the chain length of the sequence. The advantage of
knowing descriptors on the whole sequence space is that it is easy to
determine if a descriptor on the sequences is significant. For
example, if the number of hydrogen bond donors is three standard
deviations above the mean number of hydrogen bond donors over all
sequence space, then that is a significant descriptor. This is
expressed as a p-value, which is calculated from a
`wilcox.test`

. That is a non-parametric version of the
Student's t-test.

The calculations are based on the given `base.frame`

parameter. Given that matrix, which contains the descriptors
calculated on all the individual amino acids, it is possible to
calculate many sequence level descriptors. If the means are being
calculated (`do.mean=true`

), then the mean of the descriptors for
each sequence is calculated. This doubles the number of
descriptors. The same is true of the `do.var`

, which uses
variance along the sequence. The autocorrelation function can also be
calculated along the chain, again increasing the number of resulting
descriptors. This may be interesting for describing alternating
patterns. The position specific descriptors are simply the individual
descriptors at a certain position. For example, number of hydrogen
bond donors at position 2.

One often is more interested in understanding what is common amongst
the active sequences. This may be done by comparing a descriptor on the
active sequences to the inactive sequences. Since inactive sequences are
rarely collected in peptide libraries, we may approximate the inactive
sequences as all sequences. ** This assumption only holds if there
is a low number of active sequences relative to the size of the sequence
diversity**. This is often the case but must be observed during the
experiment. With this assumption, p-values may be calculated for each
descriptor. These p-values do not assume normality and are a measure of
the overlap between the active sequences and inactive sequences. They
are calculated using a Wilcox t-test. A low p-value is considered
significant and such a desciptor may be considerd to be related to
activity. ** Remember that a descriptor may be important in
connection to a motif**. Thus it is important to do both descriptors and
motif discovery. `include.staistics`

will calculate the p-values
for each of the descriptors. This is only practical for smaller lengths;
less than 10.

If `base.frame`

is `NA`

, then the default will be used,
`defaultBaseMatrix`

. See the documentation on that dataset
for more information.

`.Data`

:Object of class

`"list"`

The descriptors as a`data.frame`

. Each row is the desciptor set for a single sequence`response`

:Object of class

`"numeric"`

An optional numeric array containing responses for the sequences.`names`

:Object of class

`"character"`

The descriptor names (inherited from`data.frame`

).`row.names`

:Object of class

`"data.frameRowLabels"`

`.S3Class`

:Object of class

`"character"`

`pvalues`

:Object of class

`"numeric"`

An optional array containing estimated p-values for each descriptor. The p-value represents how different the descriptor set is as compared to a set of random peptides of the same length WITHOUT GAPS.

Class `"data.frame"`

, directly.
Class `"list"`

, by class "data.frame", distance 2.
Class `"oldClass"`

, by class "data.frame", distance 2.
Class `"vector"`

, by class "data.frame", distance 3.

Andrew White

`Sequences`

, `defaultBaseMatrix`

,
`wilcox.test`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ```
#calculate some descriptors
data(SHP2Sequences)
#turn off most of the descriptors so it goes fast
SHP2desc <- descriptors(SHP2Sequences, do.var=FALSE,
alags=c(), do.mean=TRUE, do.counts=FALSE,
do.position=FALSE, include.statistics=FALSE)
#get some descriptors and response sets
data(AMPSequences)
data(AMPSequences.response)
AMPdesc <- descriptors(AMPSequences, response=AMPSequences.response[,1], do.var=FALSE,
alags=c(), do.mean=TRUE, do.counts=FALSE,
do.position=FALSE, include.statistics=FALSE)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.