Description Usage Arguments Details Value Author(s) See Also Examples
Agglomerates sequences into groups within a specified size range based on taxonomic rank.
1 2 3 4 5 6 7 8 | FormGroups(dbFile,
tblName = "Seqs",
goalSize = 50,
minGroupSize = 25,
maxGroupSize = 5000,
includeNames = FALSE,
add2tbl = FALSE,
verbose = TRUE)
|
dbFile |
A SQLite connection object or a character string specifying the path to the database file. |
tblName |
Character string specifying the table where the rank information is located. |
goalSize |
Number of sequences required in each group to stop adding more sequences. |
minGroupSize |
Minimum number of sequences in each group required to stop trying to recombine with a larger group. |
maxGroupSize |
Maximum number of sequences in each group allowed to continue agglomeration. |
includeNames |
Logical indicating whether to include the formal scientific name in the group name. |
add2tbl |
Logical or a character string specifying the table name in which to add the result. |
verbose |
Logical indicating whether to display progress. |
FormGroups
uses the “rank” field in the dbFile
table to group sequences with similar taxonomic rank. Rank information must be present in the tblName
, such as that created by default when importing sequences from a GenBank formatted file.
Rank information contains the formal scientific name on the first line, followed by the taxonomic lineage on subsequent lines. When includeNames
is TRUE
the formal scientific name is appended to the end of the group name, otherwise only the taxonomic lineage is used as the group name.
The algorithm ascends the taxonomic tree, agglomerating taxa into groups until the goalSize
is reached. If the group size is below minGroupSize
then further agglomeration is attempted with a larger group. If additional agglomeration results in a group larger than maxGroupSize
then the agglomeration is undone so that the group is smaller. Setting minGroupSize
to goalSize
avoids the creation of polyphyletic groups. Note that this approach may often result in paraphyletic groups.
A data.frame
with the rank
and corresponding group name as identifier
. Note that quotes are stripped from group names to prevent problems that they may cause. The origin
gives the rank
preceding the identifier
. The count
denotes number of sequences corresponding to each rank
. If add2tbl
is not FALSE
then the “identifier” and “origin” columns are updated in dbFile
.
Erik Wright eswright@pitt.edu
1 2 3 4 | db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
g <- FormGroups(db, goalSize=10, minGroupSize=5, maxGroupSize=20)
head(g)
tapply(g$count, g$identifier, sum)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.