collapseNoMismatch: Combine together sequences that are identical up to shifts...
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

collapseNoMismatch

R Documentation

Combine together sequences that are identical up to shifts and/or length.

Description

This function takes as input a sequence table and returns a sequence table in which any sequences that are identical up to shifts or length variation, i.e. that have no mismatches or internal indels when aligned, are collapsed together. The most abundant sequence is chosen as the representative of the collapsed sequences. This function can be thought of as implementing greedy 100% OTU clustering with end-gapping ignored.

Usage

collapseNoMismatch(
  seqtab,
  minOverlap = 20,
  orderBy = "abundance",
  identicalOnly = FALSE,
  vec = TRUE,
  band = -1,
  verbose = FALSE
)

Arguments

`seqtab`	(Required). A sample by sequence matrix, the return of `makeSequenceTable`.
`minOverlap`	(Optional). `numeric(1)`. Default 20. The minimum amount of overlap between sequences required to collapse them together.
`orderBy`	(Optional). `character(1)`. Default "abundance". Specifies how the sequences (columns) of the returned table should be ordered (decreasing). Valid values: "abundance", "nsamples", NULL.
`identicalOnly`	(Optional). `logical(1)`. Default FALSE. If TRUE, only identical sequences (i.e. duplicates) are collapsed together.
`vec`	(Optional). `logical(1)`. Default TRUE. Use the vectorized aligner. Should be turned off if sequences exceed 2kb in length.
`band`	(Optional). `numeric(1)`. Default -1 (no banding). The Needleman-Wunsch alignment can be banded. This value specifies the radius of that band. Set band = -1 to turn off banding.
`verbose`	(Optional). `logical(1)`. Default FALSE. If TRUE, a summary of the function results are printed to standard output.

Value

Named integer matrix. A row for each sample, and a column for each collapsed sequence across all the samples. Note that the columns are named by the sequence which can make display a little unwieldy. Columns are in the same order (modulo the removed columns) as in the input matrix.

Examples

derep1 <- derepFastq(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
derep2 <- derepFastq(system.file("extdata", "sam2F.fastq.gz", package="dada2"))
dada1 <- dada(derep1, tperr1)
dada2 <- dada(derep2, tperr1)
seqtab <- makeSequenceTable(list(sample1=dada1, sample2=dada2))
collapseNoMismatch(seqtab)

benjjneb/dada2 documentation built on June 10, 2025, 10:43 p.m.

benjjneb/dada2 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

benjjneb/dada2
Accurate, high-resolution sample inference from amplicon sequencing data

collapseNoMismatch: Combine together sequences that are identical up to shifts...
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

Combine together sequences that are identical up to shifts and/or length.

Description

Usage

Arguments

Value

See Also

Examples

Related to collapseNoMismatch in benjjneb/dada2...

R Package Documentation

Browse R Packages

We want your feedback!

benjjneb/dada2 Accurate, high-resolution sample inference from amplicon sequencing data

collapseNoMismatch: Combine together sequences that are identical up to shifts... In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data

Combine together sequences that are identical up to shifts and/or length.

Description

Usage

Arguments

Value

See Also

Examples

Related to collapseNoMismatch in benjjneb/dada2...

R Package Documentation

Browse R Packages

We want your feedback!

benjjneb/dada2
Accurate, high-resolution sample inference from amplicon sequencing data

collapseNoMismatch: Combine together sequences that are identical up to shifts...
In benjjneb/dada2: Accurate, high-resolution sample inference from amplicon sequencing data