View source: R/pangenomes_mmseq.R
pangenomes_mmseqs | R Documentation |
This is an internal function for pagenomes workflow
pangenomes_mmseqs(
file_list,
coverage,
identity,
evalue,
n_cores,
cov_mode,
cluster_mode,
folder
)
file_list |
Data frame with the full path to the genome files (gene or protein multi-fasta). |
coverage |
Minimun coverage (length) to cluster. |
identity |
Minimun Identity. |
evalue |
Maximun Evalue. |
n_cores |
Number of cores to use. |
cov_mode |
Coverage mode:
|
cluster_mode |
Cluster mode:
|
Return a mmseq object.
A mmseq object is a list of two elements. First contains a data.table/data.frame with four columns (Prot_genome, Prot_Prot, Genome_genome and Genome_Prot). This is the output of MMSeqs2 and described the clustering of the input genes/proteins. First column referes to the genome that contain the representative gene/protein of the cluster. Second one, is the representative protein of the cluster (i.e. the cluster name). Third colum is the genome that contains the gene/protein of the fourth column.
In the second element we can find a data.frame/data.table with the original annotation of all representative gene/protein of each cluster in two columns. The first one Prot_prot is the same that the second one of the first element.
Steinegger M and Soeding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi: 10.1038/nbt.3988 (2017).
Steinegger M and Soeding J. Clustering huge protein sequence sets in linear time. Nature Communications, doi: 10.1038/s41467-018-04964-5 (2018).
Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.