fasta_to_tree: Infer trees from fasta files.

Description Usage Arguments Details Value Author(s) References Examples

Description

Given a folder containing unaligned sequences in fasta format (i.e., clusters), aligns each cluster with mafft (small clusters) or pasta (large clusters), excludes poorly aligned sites with phyutility, and infers a maximum-likelihood tree with RAxML (small clusters) or fasttree (large clusters). Requires all of these programs to be installed and included in the user's $PATH. Assumes clusters are named "cluster1.fa", "cluster2.fa", etc. Clusters with fewer than 1,000 sequences are considered "small," and those with more are considered "large."

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fasta_to_tree(
  path_to_ys = pkgconfig::get_config("baitfindR::path_to_ys"),
  seq_folder,
  number_cores,
  seq_type,
  bootstrap = FALSE,
  overwrite = FALSE,
  get_hash = TRUE,
  echo = pkgconfig::get_config("baitfindR::echo", fallback = FALSE),
  ...
)

Arguments

path_to_ys

Character vector of length one; the path to the folder containing Y&S python scripts, e.g., "/Users/me/apps/phylogenomic_dataset_construction/"

seq_folder

Character vector of length one; the path to the folder containing the fasta files.

number_cores

Numeric; number of threads to use for RAxML and mafft.

seq_type

Character vector of length one indicating type of sequences. Should either be "dna" for DNA or "aa" for proteins.

bootstrap

Logical; should run a bootstrap analysis be run for the trees?

overwrite

Logical; should previous output of this command be erased so new output can be written? Once erased it cannot be restored, so use with caution!

get_hash

Logical; should the 32-byte MD5 hash be computed for all output tree files concatenated together? Used for by drake_plan for tracking during workflows. If TRUE, this function will return the hash.

echo

Logical; should the standard output and error be printed to the screen?

...

Other arguments. Not used by this function, but meant to be used by drake_plan for tracking during workflows.

Details

Wrapper for Yang and Smith (2014) fasta_to_tree.py

Value

For each input cluster cluster1.fa in seq_folder, cluster1.fa.mafft.aln (small clusters) or cluster1.pasta.aln (large clusters), cluster1.fa.mafft.aln-cln (small clusters) or cluster1.fa.pasta.aln-cln (large clusters), and cluster1.raxml.tre (small clusters) or cluster1.fasttree.tre (large clusters) will be written to seq_folder. If get_hash is TRUE, the 32-byte MD5 hash be computed for all .tre files concatenated together will be returned.

Author(s)

Joel H Nitta, joelnitta@gmail.com

References

Yang, Y. and S.A. Smith. 2014. Orthology inference in non-model organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics. Molecular Biology and Evolution 31:3081-3092. https://bitbucket.org/yangya/phylogenomic_dataset_construction/overview

Examples

1
## Not run: fasta_to_tree(seq_folder = "some/folder/containing/fasta/seqs", number_cores = 1, seq_type = "dna", bootstrap = FALSE)

joelnitta/baitfindR documentation built on May 7, 2020, 6:21 p.m.