pangenomes_mmseqs: Final step of pangenomes homologous cluster

View source: R/pangenomes_mmseq.R

pangenomes_mmseqsR Documentation

Final step of pangenomes homologous cluster

Description

This is an internal function for pagenomes workflow

Usage

pangenomes_mmseqs(
  file_list,
  coverage,
  identity,
  evalue,
  n_cores,
  cov_mode,
  cluster_mode,
  folder
)

Arguments

file_list

Data frame with the full path to the genome files (gene or protein multi-fasta).

coverage

Minimun coverage (length) to cluster.

identity

Minimun Identity.

evalue

Maximun Evalue.

n_cores

Number of cores to use.

cov_mode

Coverage mode:

  • 0: Coverage of query and target

  • 1: Coverage of target

  • 2: coverage of query

  • 3: target seq.length needs be at least x% of query length

  • 4: query seq.length needs

cluster_mode

Cluster mode:

  • 0: Setcover

  • 1: connected component

  • 2: Greedy clustering by sequence length

  • 3: Greedy clustering by sequence length (low mem)

Value

Return a mmseq object.

Note

A mmseq object is a list of two elements. First contains a data.table/data.frame with four columns (Prot_genome, Prot_Prot, Genome_genome and Genome_Prot). This is the output of MMSeqs2 and described the clustering of the input genes/proteins. First column referes to the genome that contain the representative gene/protein of the cluster. Second one, is the representative protein of the cluster (i.e. the cluster name). Third colum is the genome that contains the gene/protein of the fourth column.

In the second element we can find a data.frame/data.table with the original annotation of all representative gene/protein of each cluster in two columns. The first one Prot_prot is the same that the second one of the first element.

References

Steinegger M and Soeding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi: 10.1038/nbt.3988 (2017).

Steinegger M and Soeding J. Clustering huge protein sequence sets in linear time. Nature Communications, doi: 10.1038/s41467-018-04964-5 (2018).

Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)


irycisBioinfo/PATO documentation built on Oct. 19, 2023, 3:07 p.m.