Forms Groups By Rank

Share:

Description

Agglomerates sequences into groups within a certain size range based on taxonomic rank.

Usage

1
2
3
4
5
6
7
FormGroups(dbFile,
           tblName = "Seqs",
           goalSize = 1000,
           minGroupSize = 500,
           maxGroupSize = 10000,
           add2tbl = FALSE,
           verbose = TRUE)

Arguments

dbFile

A SQLite connection object or a character string specifying the path to the database file.

tblName

Character string specifying the table where the rank information is located.

goalSize

Number of sequences required in each group to stop adding more sequences.

minGroupSize

Minimum number of sequences in each group required to stop trying to recombine with a larger group.

maxGroupSize

Maximum number of sequences in each group allowed to continue agglomeration.

add2tbl

Logical or a character string specifying the table name in which to add the result.

verbose

Logical indicating whether to print database queries and other information.

Details

FormGroups uses the “rank” field in the dbFile table to group sequences with similar taxonomic rank. Rank information must be present in the tblName, such as that created by default when importing sequences from a GenBank formatted file. The rank information must not contain repeated taxonomic names belonging to different lineages.

Beginning with the least common ranks, the algorithm agglomerates groups with similar ranks until the goalSize is reached. If the group size is below minGroupSize then further agglomeration is attempted with a larger group. If additional agglomeration results in a group larger than maxGroupSize then the agglomeration is undone so that the group is smaller.

Value

A data.frame with the rank and corresponding identifier as identifier. Note that quotes are stripped from identifiers to prevent problems that they may cause. The origin gives the rank preceding the identifier. If add2tbl is not FALSE then the “identifier” and “origin” columns are updated in dbFile.

Author(s)

Erik Wright DECIPHER@cae.wisc.edu

See Also

IdentifyByRank

Examples

1
2
3
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
g <- FormGroups(db, goalSize=10, minGroupSize=5, maxGroupSize=20)
head(g)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.