postcluster_pq: Recluster sequences of an object of class 'physeq' or a list...

View source: R/dada_phyloseq.R

postcluster_pqR Documentation

Recluster sequences of an object of class physeq or a list of DNA sequences

Description

lifecycle-maturing

This function use the merge_taxa_vec function to merge taxa into clusters.

Usage

postcluster_pq(
  physeq = NULL,
  dna_seq = NULL,
  nproc = 1,
  method = "clusterize",
  id = 0.97,
  vsearchpath = find_vsearch(),
  tax_adjust = 0,
  rank_propagation = FALSE,
  vsearch_cluster_method = "--cluster_size",
  vsearch_args = "--strand both",
  keep_temporary_files = FALSE,
  swarmpath = "swarm",
  d = 1,
  swarm_args = "--fastidious",
  mmseqs2path = find_mmseqs2(),
  mmseqs2_cluster_method = "easy-cluster",
  mmseqs2_args = "",
  method_clusterize = "overlap",
  ...
)

asv2otu(
  physeq = NULL,
  dna_seq = NULL,
  nproc = 1,
  method = "clusterize",
  id = 0.97,
  vsearchpath = find_vsearch(),
  tax_adjust = 0,
  rank_propagation = FALSE,
  vsearch_cluster_method = "--cluster_size",
  vsearch_args = "--strand both",
  keep_temporary_files = FALSE,
  swarmpath = "swarm",
  d = 1,
  swarm_args = "--fastidious",
  mmseqs2path = find_mmseqs2(),
  mmseqs2_cluster_method = "easy-cluster",
  mmseqs2_args = "",
  method_clusterize = "overlap",
  ...
)

Arguments

physeq

(required) a phyloseq-class object obtained using the phyloseq package.

dna_seq

You may directly use a character vector of DNA sequences in place of physeq args. When physeq is set, dna sequences take the value of physeq@refseq

nproc

(default: 1) Set to number of cpus/processors to use for the clustering

method

(default: clusterize) Set the clustering method.

  • clusterize use the DECIPHER::Clusterize() fonction,

  • vsearch use the vsearch software (https://github.com/torognes/vsearch) with arguments --cluster_size by default (see args vsearch_cluster_method) and ⁠-strand both⁠ (see args vsearch_args)

  • swarm use the swarm software (https://github.com/torognes/swarm)

  • mmseqs2 use the MMseqs2 software (https://github.com/soedinglab/MMseqs2) with easy-cluster by default (see args mmseqs2_cluster_method)

id

(default: 0.97) level of identity to cluster

vsearchpath

(default: vsearch) path to vsearch

tax_adjust

(Default 0) See the man page of merge_taxa_vec() for more details. To conserved the taxonomic rank of the most abundant taxa (ASV, OTU,...), set tax_adjust to 0 (default). For the moment only tax_adjust = 0 is robust

rank_propagation

(logical, default FALSE). Do we propagate the NA value from lower taxonomic rank to upper rank? See the man page of merge_taxa_vec() for more details.

vsearch_cluster_method

(default: "–cluster_size) See other possible methods in the vsearch manual (e.g. --cluster_size or --cluster_fast)

  • --cluster_fast : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence length beforehand.

  • --cluster_size : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence abundance beforehand.

vsearch_args

(default : "–strand both") a one length character element defining other parameters to passed on to vsearch.

keep_temporary_files

(logical, default: FALSE) Do we keep temporary files

  • temp.fasta (refseq in fasta or dna_seq sequences)

  • cluster.fasta (centroid if method = "vsearch")

  • temp.uc (clusters if method = "vsearch")

swarmpath

(default: swarm) path to swarm

d

(default: 1) maximum number of differences allowed between two amplicons, meaning that two amplicons will be grouped if they have d (or less) differences

swarm_args

(default : "–fastidious") a one length character element defining other parameters to passed on to swarm See other possible methods in the SWARM pdf manual

mmseqs2path

(default: find_mmseqs2()) path to MMseqs2

mmseqs2_cluster_method

(default: "easy-cluster") Either "easy-cluster" or "easy-linclust". See mmseqs2_clustering().

mmseqs2_args

(default: "") Additional arguments passed to the MMseqs2 clustering command.

method_clusterize

(default "overlap") the method for the DECIPHER::Clusterize() method

...

Additional arguments passed on to DECIPHER::Clusterize()

Details

This function use the merge_taxa_vec function to merge taxa into clusters. By default tax_adjust = 0. See the man page of merge_taxa_vec().

Value

A new object of class physeq or a list of cluster if dna_seq args was used.

Author(s)

Adrien Taudière

References

VSEARCH can be downloaded from https://github.com/torognes/vsearch. More information in the associated publication https://pubmed.ncbi.nlm.nih.gov/27781170.

See Also

vsearch_clustering(), swarm_clustering(), and mmseqs2_clustering()

Examples


if (requireNamespace("DECIPHER")) {
  postcluster_pq(data_fungi_mini)
}

## Not run: 
if (requireNamespace("DECIPHER")) {
  postcluster_pq(data_fungi_mini, method_clusterize = "longest")

  if (MiscMetabar::is_swarm_installed()) {
    d_swarm <- postcluster_pq(data_fungi_mini, method = "swarm")
  }
  if (MiscMetabar::is_vsearch_installed()) {
    d_vs <- postcluster_pq(data_fungi_mini, method = "vsearch")
  }
  if (MiscMetabar::is_mmseqs2_installed()) {
    d_mm <- postcluster_pq(data_fungi_mini, method = "mmseqs2")
  }
}

## End(Not run)

MiscMetabar documentation built on June 8, 2026, 5:07 p.m.