In KathrynCampbell/MADDOG: Lineage Designation and Assignment

knitr::opts_chunk$set(echo = TRUE)

Lineage Designation

Lineage designation is an initial step to identify all the lineages contained within a set of sequences, defined by an existing set of rules as recommended in Rambaut et al. (2020), and name them accordingly.

The designation requires some initial steps to be undertaken on the sequences first. They need to be aligned (it's recommended to use MAFFT and use the FFT-NS-2 algorithm), and then a tree created including bootstrapping (recommended using IQTREE2 with model selection and 100 bootstrap replicates or 1000 ultrafast bootstrap replicates). The ancestral sequences then need to be reconstructed (recommend using Treetime ancestral).

The lineage designation function requires the the tree, alignment, ancestral reconstructions and metadata of the sequences. The sequence IDs must match between all of these. The metadata file must have a column called 'ID' which contains the ID's for all the sequences. It also must have a column called 'year' that contains the collection year of each sequence. Additionally, it must have a column called 'country'. This should contain the country of origin of each sequence. If you don't have this information, you can leave the column blank but it needs to exist. If you have more detailed information about the origin of the sequence (e.g. a state) you can put this in the country column instead of the country. It also needs to have an 'assignment' column which contains any known clade assignments. If you don't have this information, you can leave this column blank but it needs to exist.

The fuction undertakes the following steps: Lineage defining nodes are identified according to thresholds on bootstrap support (>70, or >90 for ultrafast bootstrapping) and cluster size (>10 descendents of >95% coverage, excluding gaps and ambiguous bases). For each node still in consideration, the ancestral sequence is extracted. To define a new lineage, there must be at least one SNP between the ancestral sequence and its descendents that is shared between all sequences in the cluster. The lineages are then named according to Rambaut et al. (2020), but integrating any existing assignments if they are present.

There are two functions as part of lineage designation that work in the same way, but return different results:

seq_designation() returns a lineage designation and information about each sequence.

node_info() returns each of the defining nodes of the designations, with information about them.

library(MADDOG)
 tree<-ape::read.tree(system.file("extdata", "Examples/Lineage_designation/Example_designation_aligned.fasta.contree", package = "MADDOG"))
 alignment<-seqinr::read.fasta(system.file("extdata", "Examples/Lineage_designation/Example_designation_aligned.fasta", package = "MADDOG"))
 alignment<-ape::as.alignment(alignment)
 metadata<-read.csv(system.file("extdata", "Examples/Lineage_designation/Example_designation_metadata.csv", package = "MADDOG"))
 ancestral<-seqinr::read.fasta(system.file("extdata", "Examples/Lineage_designation/ancestral_sequences.fasta", package = "MADDOG"))
 ancestral<-ape::as.alignment(ancestral)

sequence_designation<-seq_designation(tree, 90, alignment, metadata, ancestral)
defining_node_information<-node_info(tree, 90, alignment, metadata, ancestral)
names(defining_node_information)<-c("node", "n_tips", "diff", "overlaps", "lineage", "numbers")

head(sequence_designation, 20)
defining_node_information

An additional function can be run to find out information about each of the newly designated lineages. It requires the seq_designation output, plus the metadata.

lineage_info<-lineage_info(sequence_designation, metadata) ; lineage_info

Lineage Assignment

Lineage assignment identifes which existing lineage, determined by lineage designation, a new sequence belongs to.

The function requires the test sequences in fasta format, and will return information about their assignments.

``` {r data_import2, include = FALSE} sequences<-seqinr::read.fasta(system.file("extdata", "Examples/Lineage_assignment/example.fasta", package = "MADDOG"))

``` {r assignment}
assignments<-assign_lineages(sequences, "RABV"); assignments

Visualising Lineages

This group of Lineage Figure functions produce figures to assist in interpretation of lineage data.

They all require the outputs of the lineage designation functions; seq_designation, node_info and lineage_info, as well as the tree and metadata used for lineage designation. The colour schemes across all the figures are comparable.

The sunburst function produces a sunburst plot to visualise the hierarchical relationship of the lineages.

``` {r sunburst} sunburst(lineage_info, defining_node_information, tree, metadata, sequence_designation)

The lineage_tree function produces a lineage tree with a lineage bar to visualise the phylogenetic positions of the lineages.

``` {r tree}
lineage_tree(lineage_info, defining_node_information, tree, metadata, sequence_designation)

KathrynCampbell/MADDOG documentation built on April 14, 2025, 10:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com