swarmtools: Identify sites under putative immune selection.

Description Usage Arguments Details Value Examples

Description

Given a serially sampled protein sequence alignment, this will list sites where loss of transmitted-founder (TF) form exceeds a cutoff value within a given sample timepoint.

Usage

1
2
3
4
5
6
swarmtools(aas_aln = NULL, aas_file = NULL,
  alignment_format = "fasta", tf_index = 1, tf_name = NULL,
  timepoints_parser = NULL, refseq_lut_file = NULL,
  refseq_lut = NULL, refseq_name = "HXB2", pngs2o = T,
  tf_loss_cutoff = NULL, frequency_when_up = 10, exclude_vloops = F,
  included_sites = NULL, excluded_sites = NULL)

Arguments

aas_aln

Alignment matrix.

aas_file

Alignment file.

alignment_format

Format of alignment file/s; must be one of these: "fasta", "clustal", "phylip", "msf", or "mase".

tf_index

TF index

tf_name

TF name

timepoints_parser

Timepoints parsing function generated by lassie::create.timepoint.parser()

refseq_lut_file

Reference sequence LUT file

refseq_lut

Reference sequence lookup table. Optional, this could help to apply reference sequence numbering from a separate alignment that does not contain the reference sequence.

refseq_name

Reference sequence name, used for numbering, e.g. HXB2.

pngs2o

Switch to mark asparagines (N) in PNG motifs as O.

tf_loss_cutoff

Threshold value (or vector of values) for including a site.

frequency_when_up

Sites are sorted by when they first reach this value.

exclude_vloops

Automagically add hypervariable loop sites to list of excluded sites.

included_sites

List of included sites.

excluded_sites

List of excluded sites.

Details

The sample timepoints should be part of the sequence names and it should be possible to extract them by splitting sequence names with a particular separator and taking one of the resulting fields as the timepoint label. TF loss is computed among sequences that have the same timepoint label. By default, the timepoint label is in the first dot-delimited field. However, you can specify alternatives using lassie::create.timepoint.parser() to specify how fields are separated and which field to use, then passing the returned function to swarmtools using the 'timepoints_parser' option.

Timepoint labels can be in any units and are assumed to indicate time elapsed since infection. A mixture of different units is not advised. The values are used primarily when reordering sites and plotting frequency dynamics: lassie::report.variant.frequencies() After parsing the correct field, any leading character (e.g. "d123" or "56dpi") will be stripped and the remainder converted to a numeric value (123 and 56).

You should provide at least the name of an alignment file or an alignment matrix (via seqinr::as.alignment.matrix()) already in memory, and specify a percentage value (0-100) for tf_loss_cutoff. Without these, no sites will be returned. This is not an error state, as the returned swarmtools object could be passed to lassie::set.alignment.file() and lassie::set.tf.loss.cutoff().

Value

swarmtools object

Examples

1
2
3
4
5
6
## Not run: 
  A <- lassie::swarmtools(aas_file=system.file("extdata", "CH505-gp160.fasta", 
            package="lassie"), tf_loss_cutoff=80)
  B <- lassie::swarmset(A)

## End(Not run)

phraber/lassie documentation built on May 25, 2019, 6:01 a.m.