Description Usage Arguments Value
Sequentially, each sequency is either assinged to an existing cluster or is classified as a new cluster representative if no matching cluster can be found.
1 2 | runClustering(cdhit_path, sequences, out_dir, identity_cutoff,
length_cutoff, wordlength, map, write_fastas = FALSE, optional = "")
|
cdhit_path |
Path to cd-hit-est executable |
sequences |
Vector of sequences in FASTA style generated by the sequencesAsFasta |
out_dir |
Directory to save output files of clustering |
identity_cutoff |
Sequence identity cutoff used for clustering |
length_cutoff |
Length difference cutoff |
wordlength |
CD-Hit word length |
map |
A data frame with sequences as row names and sequence identifiers in first column. Can be generated by createMap |
write_fastas |
Boolean that indicates whether a fasta file will be generated for each cluster |
optional |
Optional execution parameters |
A data frame with the columns 'SequenceID' and 'ClusterID' assigning each sequence to a cluster of similar sequences via their identifiers. Additionally, a file CD-HIT.fa, CD-HIT.fa.clstr and a folder Clusters is generated in the given output directory. The CD-HIT.fa file is the FASTA file of all cluster representatives. The CD-HIT.fa.clustr file lists all identified clusters and the assigned sequence identifiers together with the percentage of overlapping sequence with the cluster representative. In the Clusters directory there is a FASTA file for each cluster.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.