align_and_summarise: Aligns sequence data, plots haplotype networks and summarises...

Description Usage Arguments Details

View source: R/align_and_summarise.R

Description

align_and_summarise

Usage

1
2
3
4
5
6
align_and_summarise(
  directory_path = getwd(),
  alignment_files,
  max_haps_found_together,
  minbp
)

Arguments

directory_path

path to the directory in which to build/call files, defaults to working directory

alignment_files

list of .fas/.fasta files in directory that need alignment, often signaled by the pattern "FOR_ALIGNMENT" in file name

max_haps_found_together

input value for hapfreq_from_paper function. Threshold value for number of times a haplotype can be found repeated in the uploads from one study before paper is assumed to have submitted every sample rather than unique haplotypes.

minbp

minimum length (base pairs) for a sequence to be retained in the data set

Details

Function aligns sequence data, plots two types of diagnostic plots (haplotype networks and sequence size histograms), and summarises the data. Two directories for storing these outputs created automatically,"./network_diagrams" and "./histograms". Where only representative haplotypes uploaded, rather than individual samples, populations recorded for later inspection.

Unaligned sequence files are read in, aligned sequences are written out as files. Histograms of the original sequence length range and new aligned sequence length is drawn and stored out. Haplotype networks are also drawn and written out. Summary data are recorded. If only one version of each haplotype ever found, assumed that accessions might represent haplotypes not sample frequency, populations flagged for further inspection.


EvolEcolGroup/mtDNAcombine documentation built on July 8, 2021, 10:30 p.m.