CCModel-class: Class "CCModel"

Description Objects from the Class Discriminant function of model Slots Methods Default model PrOCoilModel Alternative model PrOCoilModelBA Author(s) References See Also Examples

Description

S4 class representing a coiled coil prediction model

Objects from the Class

In principle, objects of this class can be created by calls of the form new("CCModel"), although it is probably never necessary to create such an object from scratch - and not advised either. The default model is stored in the object PrOCoilModel. An alternative model, PrOCoilModelBA, that is optimized for balanced accuracy is available too (see below). Custom models can be loaded from files using the function readCCModel.

Discriminant function of model

Given a new coiled coil sequence x and a model, the discriminant function of the model is given as

f(x)=b+sum over all p in P of (N(p,x) w(p)),

where b is a constant offset, N(p,x) denotes the number of occurrences of pattern p in sequence x, and w(p) is the weight assigned to pattern p. P is the set of all patterns contained in the model. In the models used in the procoil package, the weights are computed from a support vector machine. Models can include kernel normalization or not. The formula above refers to the variant without kernel normalization. If kernel normalization is employed, the weights are computed in a different way and the discriminant function changes to

f(x)=b+(sum over all p in P of (N(p,x) w(p)))/R(x),

where R(x) is a normalization value depending on the sample x. It is defined as follows:

R(x)=sqrt(sum over all p in P of N(p,x)^2)

The procoil package does not consider arbitrary patterns, but only very specific ones: pairs of amino acids at fixed register positions with no more than a maximum number m of residues in between. Internally, these patterns are represented as strings with an amino acid letter on the first position, then a certain number of wildcards (between 0 and m as noted above), then the second amino acid letter, and an aligned sequence with the same number of wildcards and letters ‘a’-‘g’ denoting the heptad register position on the first and last amino acid, e.g. “N..La..d”. This pattern matches a coiled coil sequence if the sequence has an ‘N’ (Asparagine) at an ‘a’ position and a ‘L’ (Leucine) at the next ‘d’ position. For instance, the GCN4 wildtype has one occurrence of this pattern:

1
2
3
4
5
    MKQLEDKVEELLSKNYHLENEVARLKKLV
    abcdefgabcdefgabcdefgabcdefga
                  N..L
                  a  d
  

Slots

b:

Object of class numeric the value b as described above

m:

Object of class integer the value m as described above

scaling:

Object of class logical indicating whether the model should employ kernel normalization

weights:

Object of class matrix storing all pattern weights; the matrix in this slot is actually consisting of only one row that contains the weights. The patterns are stored in column names of the matrix and encoded in the format described above

Methods

predict

signature(object = "CCModel"): see predict

show

signature(object = "CCModel"): displays the most important information stored in the CCModel object object, such as, kernel parameters and a summary of weights.

weights

signature(object="CCModel"): returns the weights stored in object as a named numeric vector.

Default model PrOCoilModel

The procoil package provides a default coiled coil prediction model, PrOCoilModel. The model was created with the kebabs package [Palme et al., 2015] using the coiled coil kernel with m=5, C=2, and kernel normalization on the BLAST-augmented data set. It is optimized for standard (unbalanced) accuracy, i.e. it tries to minimize the probability of misclassifications. Since dimers are more frequent in the data set, it slightly favors dimers for unknown sequences.

Note that this is not the original model as described in [Mahrenholz et al., 2011]. The models have been re-trained for version 2.0.0 of the package using a newer snapshot of PDB and newer methods. The original models are still available for download and can still be used if the user wishes to. For detailed instructions, see the package vignette.

Alternative model PrOCoilModelBA

As mentioned above, the default model PrOCoilModel slightly favors dimers. This may be undesirable for some applications. For such cases, an alternative model PrOCoilModelBA is available that is optimized for balanced accuracy, i.e. it tries not to favor the larger class - dimers -, but may therefore prefer trimers in borderline cases. The overall misclassification probability is slightly higher for this model than for the default model PrOCoilModel.

The model PrOCoilModelBA was created with PSVM [Hochreiter and Obermayer, 2006] using the coiled coil kernel with m=8, C=8, e=0.8, class balancing, and kernel normalization on the PDB data set (i.e. without BLAST augmentation).

The same applies as for PrOCoilModel: this model has been re-trained for package version 2.0.0. For detailed instructions how to use the original models, see the package vignette.

Author(s)

Ulrich Bodenhofer bodenhofer@bioinf.jku.at

References

http://www.bioinf.jku.at/software/procoil/

Mahrenholz, C.C., Abfalter, I.G., Bodenhofer, U., Volkmer, R., and Hochreiter, S. (2011) Complex networks govern coiled coil oligomerization - predicting and profiling by means of a machine learning approach. Mol. Cell. Proteomics 10(5):M110.004994. DOI: 10.1074/mcp.M110.004994

Palme, J., Hochreiter, S., and Bodenhofer, U. (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics 31(15):2574-2576. DOI: 10.1093/bioinformatics/btv176

Hochreiter, S., and Obermayer, K. (2006) Support vector machines for dyadic data. Neural Computation 18:1472-1510. DOI: 10.1162/neco.2006.18.6.1472

See Also

predict-methods

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
showClass("CCModel")

## show summary of default model (optimized for accuracy)
PrOCoilModel

## show weight of pattern "N..La..d"
weights(PrOCoilModel)["N..La..d"]

## show the 10 patterns that are most indicative for trimers
## (as the weights are sorted in descending order in PrOCoilModel)
weights(PrOCoilModel)[1:10]

## predict oligomerization of GCN4 wildtype
GCN4wt <- predict(PrOCoilModel,
                  "MKQLEDKVEELLSKNYHLENEVARLKKLV",
                  "abcdefgabcdefgabcdefgabcdefga")

## show summary of alternative model (optimized for balanced accuracy)
PrOCoilModelBA

## show weight of pattern "N..La..d"
weights(PrOCoilModelBA)["N..La..d"]

## show the 10 patterns that are most indicative for trimers
## (as the weights are sorted in descending order in PrOCoilModelBA)
weights(PrOCoilModelBA)[1:10]

procoil documentation built on Nov. 8, 2020, 8:05 p.m.