FindMyFriends-package: FindMyFriends: Comparative microbial genomics in R

Description Details Author(s)

Description

FindMyFriends: Comparative microbial genomics in R

Details

This package has two objectives: Define a framework for working with pangenomic data in R and provide speed and memory effecient algorithms that makes it possible to create huge pangenomes in a reasonable amount of time. While providing novel algorithms itself it also makes it possible to import results from other algorithms into the framework thus facilitating doing post-processing of results from other tools that only provides an initial grouping of genes.

In order to balance speed and memory consumption FindMyFriends provides two different sequence storage modes - either in-memory or as a reference to the original fasta file. The former excels in lookup speed but can end up too unwieldy for big pangenomes with Gb of sequence data. The latter in contrast can handle extremely huge sets of genes but can in turn slow down calculations due to longer sequence lookups.

The novelty of the FindMyFriends algorithms lie primarily in the fact that they utilise allignment-free sequence comparisons based on cosine similarity of kmer feature vectors. This is substantially faster than BLAST while retaining the needed resolution. Another novelty is the introduction of Guided Pairwise Comparison - a different approach than standard all-vs-all comparisons.

Author(s)

Thomas Lin Pedersen


FindMyFriends documentation built on Nov. 8, 2020, 6:46 p.m.