mash: MASH distance estimation.

View source: R/mash.R

mashR Documentation

MASH distance estimation.


MASH (Fast genome and metagenome distance estimation using MinHash) is a fast sequence distance estimator that uses the MinHash algorithm and is designed to work with genomes and metagenomes in the form of assemblies or reads ( This function is a wrapper to execute mash in the background and import to R as a mash object.


mash(file_list, n_cores = 4, sketch = 1000, kmer = 21, type = "prot")



Data frame with the full path to the genome files (gene or protein multi-fasta).


Number of cores to use.


Number of sketches to use for distance estimation.


Kmer size.


Type of sequence 'nucl' (nucleotides) or 'prot' (aminoacids)


A mash object


A mash is a list of two element.

The first one contains a rectangular and simetric matrix with the distances among genomes. As a matrix has genomes as rownames and colnames

The second one is a data.table/data.frame with all the distancies as list. The table has the columns c("Source","Target","Dist")


Mash: fast genome and metagenome distance estimation using MinHash. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.

Mash Screen: High-throughput sequence containment estimation for genome discovery. Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. BioRxiv. 2019 Mar. doi: 10.1101/557314

irycisBioinfo/PATO documentation built on July 19, 2022, 7:21 a.m.