isContaminant: Identify contaminant sequences.

Description Usage Arguments Value Examples

View source: R/decontam.R

Description

The frequency of each sequence (or OTU) in the input feature table as a function of the concentration of amplified DNA in each sample is used to identify contaminant sequences.

Usage

1
2
3
4
isContaminant(seqtab, conc = NULL, neg = NULL, method = c("auto",
  "frequency", "prevalence", "combined", "minimum", "either", "both"),
  batch = NULL, batch.combine = c("minimum", "product", "fisher"),
  threshold = 0.1, normalize = TRUE, detailed = TRUE)

Arguments

seqtab

(Required). Integer matrix or phyloseq object. A feature table recording the observed abundances of each sequence variant (or OTU) in each sample. Rows should correspond to samples, and columns to sequences (or OTUs). If a phyloseq object is provided, the otu-table component will be extracted.

conc

(Optional). numeric. Required if performing frequency-based testing. A quantitative measure of the concentration of amplified DNA in each sample prior to sequencing. All values must be greater than zero. Zero is assumed to represent the complete absence of DNA. If seqtab was prodivded as a phyloseq object, the name of the appropriate sample-variable in that phyloseq object can be provided.

neg

(Optional). logical. Required if performing prevalence-based testing. TRUE if sample is a negative control, and FALSE if not (NA entries are not included in the testing). Extraction controls give the best results. If seqtab was provided as a phyloseq object, the name of the appropriate sample-variable in that phyloseq object can be provided.

method

(Optional). character. The method used to test for contaminants.

auto

(Default). frequency, prevalence or combined will be automatically selected based on whether just conc, just neg, or both were provided.

frequency

Contaminants are identified by frequency that varies inversely with sample DNA concentration.

prevalence

Contaminants are identified by increased prevalence in negative controls.

combined

The frequency and prevalence probabilities are combined with Fisher's method and used to identify contaminants.

minimum

The minimum of the frequency and prevalence probabilities is used to identify contaminants.

either

Contaminants are called if identified by either the frequency or prevalance methods.

both

Contaminants are called if identified by both the frequency and prevalance methods.

batch

(Optional). factor, or any type coercible to a factor. Default NULL. If provided, should be a vector of length equal to the number of input samples which specifies which batch each sample belongs to (eg. sequencing run). Contaminants identification will be performed independently within each batch. If seqtab was provided as a phyloseq object, the name of the appropriate sample-variable in that phyloseq object can be provided.

batch.combine

(Optional). Default "minimum". For each input sequence variant (or OTU) the probabilities calculated in each batch are combined into a single probability that is compared to 'codethreshold' to classify contaminants. Valid values: "minimum", "product", "fisher".

threshold

(Optional). Default 0.1. The probability threshold below which (strictly less than) the null-hypothesis (not a contaminant) should be rejected in favor of the alternate hypothesis (contaminant). A length-two vector can be provided when using the either or both methods: the first value is the threshold for the frequency test and the second for the prevalence test.

normalize

(Optional). Default TRUE. If TRUE, the input seqtab is normalized so that each row sums to 1 (converted to frequency). If FALSE, no normalization is performed (the data should already be frequencies or counts from equal-depth samples).

detailed

(Optional). Default TRUE. If TRUE, the return value is a data.frame containing diagnostic information on the contaminant decision. If FALSE, the return value is a logical vector containing the binary contaminant classifications.

Value

If detailed=TRUE a data.frame with classification information. If detailed=FALSE a logical vector is returned, with TRUE indicating contaminants.

Examples

1
2
3
4
5
6
7
st <- readRDS(system.file("extdata", "st.rds", package="decontam"))
# conc should be positive and non-zero
conc <- c(6413, 3581.0, 5375, 4107, 4291, 4260, 4171, 2765, 33, 48)
neg <- c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
# Use frequency or frequency and prevalence to identify contaminants
isContaminant(st, conc=conc, method="frequency", threshold=0.2)
isContaminant(st, conc=conc, neg=neg, method="both", threshold=c(0.1,0.5))

decontam documentation built on Nov. 8, 2020, 10:58 p.m.