get_mixed_clusters: Get mixed clusters

get_mixed_clustersR Documentation

Get mixed clusters

Description

Cluster sequences at a certain taxonomic similarity, and find clusters that contain mixed taxonomic names,

Note, it is recommended to set a unique seed using set.seed()

Usage

get_mixed_clusters(
  x,
  db,
  rank = "order",
  threshold = 0.97,
  rngseed = FALSE,
  confidence = 0.8,
  return = "consensus",
  k = 5,
  quiet = FALSE,
  ...
)

Arguments

x

A DNAbin list object whose names include NCBItaxonomic identification numbers.

db

A taxonomic database from get_ncbi_taxonomy or get_ott_lineage

rank

The taxonomic rank to check clusters at, accepts a character such as "order", or vector of characters such as c("species", "genus"). If "all", the clusters will be checked at all taxonomic ranks available.

threshold

numeric between 0 and 1 giving the OTU identity cutoff for clustering. Defaults to 0.97.

rngseed

(Optional) A single integer value passed to set.seed, which is used to fix a seed for reproducibly random number generation for the kmeans clustering. If set to FALSE, then no fiddling with the RNG seed is performed, and it is up to the user to appropriately call set.seed beforehand to achieve reproducible results.

confidence

The minimum confidence value for a mixed cluster to be flagged. For example, if confidence = 0.8 (the default value) a cluster will only be flagged if the taxonomy of a sequence within the cluster differs from at least four other independent sequences in its cluster. @param nstart how many random sets should be chosen for kmeans, It is recommended to set the value of nstart to at least 20. While this can increase computation time, it can improve clustering accuracy considerably.

return

What type of data about the data should be returned. Options include: Consensus - The consensus taxonomy for each cluster and associated confidence level All - Return all taxa in mixed clusters and their sequence accession numbers Count - Return counts of all taxa within each cluster

k

integer giving the k-mer size used to generate the input matrix for k-means clustering.

quiet

logical indicating whether progress should be printed to the console.

...

further arguments to pass to kmer::otu.

Examples

## Not run: 
seqs <- ape::read.FASTA("test.fa.gz")

# NCBI taxonomy
mixed <- get_mixed_clusters(seqs, db, rank="species", threshold=0.99, confidence=0.8, quiet=FALSE)

# OTT taxonomy
seqs <- map_to_ott(
seqs, dir="ott3.2", from="ncbi",
resolve_synonyms=TRUE, filter_bads=TRUE, remove_na = TRUE, quiet=FALSE
)

mixed <- get_mixed_clusters(
seqs, db, rank="species",
threshold=0.99, confidence=0.6, quiet=FALSE
)

## End(Not run)


alexpiper/taxreturn documentation built on Sept. 14, 2024, 7:56 p.m.