baitfindR: Find Baits for Sequence Capture

Description Usage Arguments Details Value Author(s) Examples

Given a folder containing DNA sequences in multi-fasta format (i.e., each fasta file contains more than one sequence) and a dataframe including taxonomic data and ingroup/outgroup status, filter_fasta() outputs a list of those fasta files that pass one of two filters, or a combination of both. One filter excludes fasta files that do not contain greater than the minimum number of ingroup sequences. The other filter excludes fasta files that do not contain at least one sequence per ingroup taxon at the specified taxonomic rank.

filter_fasta(
  seq_folder,
  taxonomy_data,
  filter_col = NULL,
  min_taxa = NULL,
  exclude_short = FALSE,
  sample_col = "sample",
  group_col = "group",
  ...
)

`seq_folder`	Character vector of length one; the path to the folder containing the fasta files (ending in `.fa` or `.fasta`) to filter.
`taxonomy_data`	Dataframe matching sequences to ingroup/outgroup status and (optionally) higher-level taxonomic ranks for filtering. The columns must follow this format: sample Unique identifier for the source of the sequence, such as transcriptome IDs or species names. All sequences names must include such an identifier. group Either "in" or "out" (case-insensitive) depending if that sample is in the ingroup or outgroup. (user-selected taxonomic rank) The user can provide any taxonomic rank they wish to filter by. For example, alignments can be filtered by having at least one representative of each ingroup genus (family, order, etc.) in the dataset.
`filter_col`	Optional character; the name of the column to be used for filtering by taxonomic rank in `taxonomy_data`.
`min_taxa`	Minimum number of ingroup samples required to pass the filter.
`exclude_short`	Logical; should extremely short sequences be excluded from the alignment during filtering? If `TRUE`, the minimum length is set to be within 1 standard deviation of the mean sequence length for a given alignment.
`sample_col`	Optional character; user-provided column name for `sample` in `taxonomy_data`.
`group_col`	Optional character; user-provided column name for `group` `taxonomy_data`.
`...`	Other arguments. Not used by this function, but meant to be used by `drake_plan` for tracking during workflows.

For example, if the dataset includes multiple ingroup genera each with multiple samples per genus, we may wish to filter alignments such that we only keep those with at least one sequence per ingroup genus. To do this, include a column called "genus" in taxonomy_data, and set filter_col = "genus".

A named list of DNA sequences of class DNAbin that passed the filter. These are not modified in any way; they simply met the requirements of the filter.

Joel H Nitta, joelnitta@gmail.com

## Not run: filter_fasta(
  seq_folder = "some/folder/",
  taxonomy_data = onekp_data,
  filter_col = "genus",
  min_taxa = 2)
## End(Not run)

joelnitta/baitfindR documentation built on May 7, 2020, 6:21 p.m.

joelnitta/baitfindR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

joelnitta/baitfindR
Find Baits for Sequence Capture

filter_fasta: Filter fasta files by ingroup/outgroup status and taxonomy.
In joelnitta/baitfindR: Find Baits for Sequence Capture

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to filter_fasta in joelnitta/baitfindR...

R Package Documentation

Browse R Packages

We want your feedback!

joelnitta/baitfindR Find Baits for Sequence Capture

filter_fasta: Filter fasta files by ingroup/outgroup status and taxonomy. In joelnitta/baitfindR: Find Baits for Sequence Capture

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to filter_fasta in joelnitta/baitfindR...

R Package Documentation

Browse R Packages

We want your feedback!

joelnitta/baitfindR
Find Baits for Sequence Capture

filter_fasta: Filter fasta files by ingroup/outgroup status and taxonomy.
In joelnitta/baitfindR: Find Baits for Sequence Capture