There is no "Submit" button. If it seems slow to update, it is busy processing your data. Please be patient.
This phase of analysis requires:
You can choose either to use the example data from CH505 or your own alignment. If you select your own alignment file, consider using the Advanced Options.
You must choose a setting for the cutoff parameter value. The number of sites selected depends on the TF loss cutoff threshold. Higher cutoffs include fewer sites. This is set high by default, but can be adjusted using the slider bar.
A list of selected sites will appear when defined. The table is interactive. You can sort rows by clicking on the column name and page through results. Searching is also possible.
Things to look for when reviewing results are the number of sites and accurately parsed timepoint labels.
Timepoint labels can be in any units and are assumed to indicate time elapsed since infection. A mixture of different units is not advised. The values are used primarily when reordering sites. After parsing the field indicated, any characters (e.g. "d123" or "56dpi") will be stripped and the remaining digits converted to a numeric value (123 and 56).
Other informative columns for review are when_up and tf_area. The former indicates when the site first exceeds 10% TF loss, or another value if given by the "Order sites by when they first reach..." slider. The tf_area is computed as the polygon area under the TF line, which may help resolve timing between sites that revert or change at different rates. The columns labeled 'l' and 'r' are used internally to label sites where indels occur relative to the reference sequence, and can be disregarded. (They will eventually be removed from this table.) For rows where these values equal, either value gives the reference sequence numbering. Rows where 'l' and 'r' differ indicate inserted sites, relative to the reference sequence, as is common for hypervariable loops of the HIV-1 Envelope glycoprotein.
When you are satisfied with the selected sites, check the "Send selected sites to Step 2" box to proceed, then click on "Step 2: Select Sequences".
Check the box to enable several more choices:
Not yet done:
Information on this tab depends on information provided in the preceding step. When results are available, a summary of the number of selected sequences will appear on the top left, and a list of concatamers will appear in a large panel.
Not yet done (future "advanced options"):
Concatamers are the selected sites from Step 1, strung together. One concatamer is listed for each of the selected sequences. The order of sites from left to right is intended to capture their relative strength of selection, sorted first by when each site first reaches at least 10% TF loss, then by the cumulative (non-) TF area.
Each link under "Download Results" provides a different summary of the analysis, including plots of sequence logos for selected sites among the selected sequences.
The code to generate sequence logos is a standalone python program. If you see an error like this "Could not find Ghostscript on path. There should be either a gs executable or a gswin32c.exe on your system's path" then please install gs or gswin32c.exe per instructions for "Dependencies" provided at http://weblogo.threeplusone.com/manual.html#download
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.