cdhitGrouping: Gene grouping by preclustering with CD-HIT
In FindMyFriends: Microbial Comparative Genomics in R

Description Usage Arguments Value Methods (by class) References See Also Examples

This grouping algorithm partly mimicks the approach used by Roary, but instead of using BLAST in the second pass it uses cosine similarity of kmer feature vectors, thus providing an even greater speedup. The algorithm uses the CD-HIT algorithm to precluster highly similar sequences and then groups these clusters by extracting a representative and clustering these using the standard FindMyFriends kmer cosine similarity.

cdhitGrouping(object, ...)

## S4 method for signature 'pgVirtual'
cdhitGrouping(object, kmerSize, lowerLimit,
  maxLengthDif, geneChunkSize, cdhitOpts, cdhitIter = TRUE, nrep = 1,
  from = 0.9, by = 0.05)

`object`	A pgVirtual subclass
`...`	parameters passed on.
`kmerSize`	The size of the kmer's used for the comparison. If two values are given the first will be used for the CD-HIT algorithm and the second will be used for the cosine similarity calculations.
`lowerLimit`	A numeric giving the lower bounds of similarity below which it will be set to zero.
`maxLengthDif`	The maximum deviation in sequence length to allow during preclustering with CD-HIT. Below 1 it describes a percentage. Above 1 it describes a fixed length.
`geneChunkSize`	The maximum number of genes to pass to the CD-HIT algorithm. If object contains more genes than this, CD-HIT will be run in chunks and combined with a second CD-HIT pass before the final cosine similarity grouping.
`cdhitOpts`	Additional arguments passed on to CD-HIT. It should be a named list with names corresponding to the arguments expected in the CD-HIT algorithm (without the dash). i, n and s/S will be overwritten based on the other parameters given to this function and all values in cdhitOpts will be converted to character using as.character
`cdhitIter`	Logical. Should the preclustered groups be grouped by gradually lowering the threshold in CD-Hit or by directly calculating kmer similarities between all preclusters and group by that. Defaults to TRUE
`nrep`	If `cdhitIter = TRUE`, controls how many iterations should be performed at each threshold level. Defaults to 1.
`from`	The start similarity threshold to use for the iterative CD-Hit grouping. Together with by and nrep it defines the number of times and levels CD-Hit is run. Defaults to 0.9
`by`	The step size to use for the iterative CD-Hit grouping. Defaults to 0.05

An object of the same class as 'object'.

pgVirtual: Grouping using cdhit for all pgVirtual subclasses

Page, A. J., Cummins, C. A., Hunt, M., Wong, V. K., Reuter, S., Holden, M. T. G., et al. (2015). Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, btv421.

Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W. (2012). CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics, 28 (23), 3150–3152.

Li, W. and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22, 1658–9.

Other grouping algorithms: gpcGrouping, graphGrouping, manualGrouping

1
2
3

testPG <- .loadPgExample()

testPG <- cdhitGrouping(testPG)

FindMyFriends documentation built on Nov. 8, 2020, 6:46 p.m.

FindMyFriends index

Package overview README.md Creating pangenomes using FindMyFriends

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

FindMyFriends
Microbial Comparative Genomics in R

cdhitGrouping: Gene grouping by preclustering with CD-HIT
In FindMyFriends: Microbial Comparative Genomics in R

Description

Usage

Arguments

Value

Methods (by class)

References

See Also

Examples

Related to cdhitGrouping in FindMyFriends...

R Package Documentation

Browse R Packages

We want your feedback!

FindMyFriends Microbial Comparative Genomics in R

cdhitGrouping: Gene grouping by preclustering with CD-HIT In FindMyFriends: Microbial Comparative Genomics in R

Description

Usage

Arguments

Value

Methods (by class)

References

See Also

Examples

Related to cdhitGrouping in FindMyFriends...

R Package Documentation

Browse R Packages

We want your feedback!

FindMyFriends
Microbial Comparative Genomics in R

cdhitGrouping: Gene grouping by preclustering with CD-HIT
In FindMyFriends: Microbial Comparative Genomics in R