clean_alignment: Identify and remove repeated haplotypes from a (MalAvi)...

Description Usage Arguments Details Value Author(s) Examples

View source: R/clean_alignment.r

Description

Several lineages in the MalAvi database differ by ambiguous base calls only (e.g., "N" or "Y") and thus represent repeated haplotypes. For phylogenetic analysis it might make sense to only include one representative of any repeated haplotype because there is no way to know if they represent one or two lineages. This function identifies such repeated haplotypes in an alignment and randomly selects one of their lineages to be representative of the haplotype. Using this selection, the function subsets the alignment so that all haplotypes are only represented once.

Usage

1
clean_alignment(alignment, separate_by_genus = FALSE, haplotype_format_wide = TRUE)

Arguments

alignment

a DNA sequence alignment of class DNAbin.

separate_by_genus

if the alignment is a MalAvi alignment with uncleaned sequence names (see details) you can choose to output the cleaned alignments by parasite genus by setting to TRUE. Defaults to FALSE.

haplotype_format_wide

if the lineage names associated with each repeated haplotype should be in wide format (TRUE, easier to visualize) or long format (FALSE, easier to subset). Defaults to TRUE

Details

In a MalAvi alignment the default sequence (i.e., lineage) names have extra information and typically begin with a letter that indicates the parasite genus. This information can be used to separate the alignments by parasite genus if separate_by_genus is set to TRUE.

Value

Returns a list composed of the following elements:

repeated_haplotypes

A data frame (in wide or long format) of repeated haplotypes and associated sequence (lineage) names

selected_lineages

A vector of randomly selected sequence (lineage) names chosen to represent each repeated haplotype

alignment_clean

A sequence alignment of class DNAbin that has only a single representative for each haplotype. Alternatively alignment_clean_Plasmodium, alignment_clean_Haemoproteus, alignment_clean_Leucocytozoon if separate_by_genus is set to TRUE

Author(s)

Vincenzo A. Ellis vincenzoaellis@gmail.com

Examples

1
2
3
4
## load the long seqs alignment from MalAvi then clean it
long.seqs <- extract_alignment("long seqs")
long.seqs.clean <- clean_alignment(long.seqs)
long.seqs.clean

vincenzoaellis/malaviR documentation built on Oct. 10, 2019, 10:55 p.m.