get_mixed_clusters: Get mixed clusters
In alexpiper/taxreturn: An R package for retrieving and curating public DNA barcode reference data

get_mixed_clusters

R Documentation

Get mixed clusters

Description

Cluster sequences at a certain taxonomic similarity, and find clusters that contain mixed taxonomic names,

Note, it is recommended to set a unique seed using set.seed()

Usage

get_mixed_clusters(
  x,
  db,
  rank = "order",
  threshold = 0.97,
  rngseed = FALSE,
  confidence = 0.8,
  return = "consensus",
  k = 5,
  quiet = FALSE,
  ...
)

Arguments

`x`	A DNAbin list object whose names include NCBItaxonomic identification numbers.
`db`	A taxonomic database from `get_ncbi_taxonomy` or `get_ott_lineage`
`rank`	The taxonomic rank to check clusters at, accepts a character such as "order", or vector of characters such as c("species", "genus"). If "all", the clusters will be checked at all taxonomic ranks available.
`threshold`	numeric between 0 and 1 giving the OTU identity cutoff for clustering. Defaults to 0.97.
`rngseed`	(Optional) A single integer value passed to set.seed, which is used to fix a seed for reproducibly random number generation for the kmeans clustering. If set to FALSE, then no fiddling with the RNG seed is performed, and it is up to the user to appropriately call set.seed beforehand to achieve reproducible results.
`confidence`	The minimum confidence value for a mixed cluster to be flagged. For example, if confidence = 0.8 (the default value) a cluster will only be flagged if the taxonomy of a sequence within the cluster differs from at least four other independent sequences in its cluster. @param nstart how many random sets should be chosen for `kmeans`, It is recommended to set the value of nstart to at least 20. While this can increase computation time, it can improve clustering accuracy considerably.
`return`	What type of data about the data should be returned. Options include: Consensus - The consensus taxonomy for each cluster and associated confidence level All - Return all taxa in mixed clusters and their sequence accession numbers Count - Return counts of all taxa within each cluster
`k`	integer giving the k-mer size used to generate the input matrix for k-means clustering.
`quiet`	logical indicating whether progress should be printed to the console.
`...`	further arguments to pass to kmer::otu.

Examples

## Not run: 
seqs <- ape::read.FASTA("test.fa.gz")

# NCBI taxonomy
mixed <- get_mixed_clusters(seqs, db, rank="species", threshold=0.99, confidence=0.8, quiet=FALSE)

# OTT taxonomy
seqs <- map_to_ott(
seqs, dir="ott3.2", from="ncbi",
resolve_synonyms=TRUE, filter_bads=TRUE, remove_na = TRUE, quiet=FALSE
)

mixed <- get_mixed_clusters(
seqs, db, rank="species",
threshold=0.99, confidence=0.6, quiet=FALSE
)

## End(Not run)

alexpiper/taxreturn documentation built on Sept. 14, 2024, 7:56 p.m.

alexpiper/taxreturn index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

alexpiper/taxreturn
An R package for retrieving and curating public DNA barcode reference data

get_mixed_clusters: Get mixed clusters
In alexpiper/taxreturn: An R package for retrieving and curating public DNA barcode reference data

Get mixed clusters

Description

Usage

Arguments

Examples

Related to get_mixed_clusters in alexpiper/taxreturn...

R Package Documentation

Browse R Packages

We want your feedback!

alexpiper/taxreturn An R package for retrieving and curating public DNA barcode reference data

get_mixed_clusters: Get mixed clusters In alexpiper/taxreturn: An R package for retrieving and curating public DNA barcode reference data

Get mixed clusters

Description

Usage

Arguments

Examples

Related to get_mixed_clusters in alexpiper/taxreturn...

R Package Documentation

Browse R Packages

We want your feedback!

alexpiper/taxreturn
An R package for retrieving and curating public DNA barcode reference data

get_mixed_clusters: Get mixed clusters
In alexpiper/taxreturn: An R package for retrieving and curating public DNA barcode reference data