mash | R Documentation |
MASH (Fast genome and metagenome distance estimation using MinHash) is a fast sequence distance estimator that uses the MinHash algorithm and is designed to work with genomes and metagenomes in the form of assemblies or reads (https://mash.readthedocs.io/). This function is a wrapper to execute mash in the background and import to R as a mash object.
mash(file_list, n_cores = 4, sketch = 1000, kmer = 21, type = "prot")
file_list |
Data frame with the full path to the genome files (gene or protein multi-fasta). |
n_cores |
Number of cores to use. |
sketch |
Number of sketches to use for distance estimation. |
kmer |
Kmer size. |
type |
Type of sequence 'nucl' (nucleotides) or 'prot' (aminoacids) |
A mash object
A mash is a list of two element.
The first one contains a rectangular and simetric matrix with the distances among genomes. As a matrix has genomes as rownames and colnames
The second one is a data.table/data.frame with all the distancies as list. The table has the columns c("Source","Target","Dist")
Mash: fast genome and metagenome distance estimation using MinHash. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
Mash Screen: High-throughput sequence containment estimation for genome discovery. Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. BioRxiv. 2019 Mar. doi: 10.1101/557314
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.