hd_seqs: High diversity sequence sample

Description Usage Format Source


The HVTN 503/Phambili study (Gray et al. Lancet Infect Dis. 2011) followed HIV negative subjects monitoring for HIV-1 infection. To produce this dataset, we took the PID Illumina MiSeq sequence data from the sample HVTN503-162450071-1056 and built phylogenetic trees with RAxML. The following RAxML settings were used:

-f a

Perform rapid bootstrap analysis and search for the best-scoring maximum likelihood tree in one program run.

-x 12345

Seed for the random number generator used by the rapid bootstrap analysis.

-p 12345

Seed for the random number generator used in the parsimony inferences.

-# 100

The number of bootstrap analyses to run on distinct starting trees.


The model used for the nucleotide substitutions. The general time reversible model with optimization of the substitution rates and the GAMMA model of rate heterogeneity.

Using the tree produced by RAxML, a random subtype-C sequence was selected (referred to as the seed sequence) from LANL (C.ZA.08.707PKE34F2.HM623575), restricted to the same amplicon as the real dataset and mutated according to these trees. To simulate test data, the tree was loaded into R in a data.frame in which each row represents an edge. The data.frame contains three columns, the first one lists the ancestor, the second one lists the descendant and the last one the length of the edge. The simulation is initiated by assigning the seed sequence to the descendant in the first row of the dataset. The ancestor is then constructed by randomly mutating the seed sequence until it diverged by the edge length. The newly simulated ancestor sequence is then used to generate the other sequences that are directly related to it. This process is continued until all the sequences in the entire tree (including the internal nodes) are generated. Extra variability was introduced into this dataset by multiplying all the branch lengths by 2.




A SeqFastadna object from library seqinr


Based on sample HVTN503-162450071-1056 from the HVTN 503/Phambili study (Gray et al. Lancet Infect Dis. 2011).

philliplab/hypermutR documentation built on Sept. 2, 2020, 2:51 p.m.