Description Usage Arguments Details Value Author(s) Examples
Given a folder containing DNA sequences in multi-fasta format (i.e., each fasta
file contains more than one sequence) and a dataframe including taxonomic data and
ingroup/outgroup status, filter_fasta()
outputs a list of those fasta files
that pass one of two filters, or a combination of both. One filter excludes fasta
files that do not contain greater than the minimum number of ingroup sequences. The
other filter excludes fasta files that do not contain at least one sequence per
ingroup taxon at the specified taxonomic rank.
1 2 3 4 5 6 7 8 9 10 | filter_fasta(
seq_folder,
taxonomy_data,
filter_col = NULL,
min_taxa = NULL,
exclude_short = FALSE,
sample_col = "sample",
group_col = "group",
...
)
|
seq_folder |
Character vector of length one; the path to the folder containing
the fasta files (ending in |
taxonomy_data |
Dataframe matching sequences to ingroup/outgroup status and (optionally) higher-level taxonomic ranks for filtering. The columns must follow this format:
|
filter_col |
Optional character; the name of the column to be used for
filtering by taxonomic rank in |
min_taxa |
Minimum number of ingroup samples required to pass the filter. |
exclude_short |
Logical; should extremely short sequences be excluded from
the alignment during filtering? If |
sample_col |
Optional character; user-provided column name for |
group_col |
Optional character; user-provided column name for |
... |
Other arguments. Not used by this function, but meant to be used by
|
For example, if the dataset includes multiple ingroup genera each with multiple
samples per genus, we may wish to filter alignments such that we only keep those
with at least one sequence per ingroup genus. To do this, include a column
called "genus"
in taxonomy_data
, and set filter_col = "genus"
.
A named list of DNA sequences of class DNAbin
that passed the filter.
These are not modified in any way; they simply met the requirements of the filter.
Joel H Nitta, joelnitta@gmail.com
1 2 3 4 5 6 | ## Not run: filter_fasta(
seq_folder = "some/folder/",
taxonomy_data = onekp_data,
filter_col = "genus",
min_taxa = 2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.