The following steps are taken:
These are the 'bread and butter' alignments, and have two conditions for acceptance
We might also consider the frequency of occurence in gpm - for now, we are leaving that
The next phase is to consider the non-modal hydroxylation alignments. Strategy for each is:
1509 -
1509 - T only -
Went through the hand-classified data for sample C1 - did each step in turn, tried to work out where the issues were.
sequence GLHGEFGLPGPAGPR
has the following for mcs:
716 GLHGEFGLPGPAGPR 0 0 1461.753 0.032736
717 GLHGEFGLPGPAGPR 1 0 1477.753 0.259136
718 GLHGEFGLPGPAGPR 2 0 1493.753 0.456776
719 GLHGEFGLPGPAGPR 3 0 1509.753 0.228096
720 GLHGEFGLPGPAGPR 4 0 1525.753 0.023256
and for gpm:
seq nhyd nglut mass1 prob col
537 GLHGEFGLPGPAGPR 1 0 1477.753 1 2
538 GLHGEFGLPGPAGPR 2 0 1493.753 1 2
The analysis of sample C1 showed matchs at 1477.753 and 1493.753 respectively. Although these are the two most common hyd'y levels, the intensity in C1 is reversed. mcs also aligns with 3 hydroxylations - so levels 3 & 1 are higher than level 2.
What could be the cause of this?? One possibility is that hydroxylation of particular sites is NOT independently distributed. This would result in a multimodal distribution, rather than the binomial thing that we're currently using.
A list of suggestions ....
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.