subsampleCounts: Subsample Counts

subsampleCountsR Documentation

Subsample Counts

Description

subsampleCounts will randomly subsample counts in SummarizedExperiment and return the a modified object in which each sample has same number of total observations/counts/reads.

Usage

subsampleCounts(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  min_size = min(colSums2(assay(x))),
  replace = TRUE,
  name = "subsampled",
  verbose = TRUE,
  ...
)

## S4 method for signature 'SummarizedExperiment'
subsampleCounts(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  min_size = min(colSums2(assay(x))),
  replace = TRUE,
  name = "subsampled",
  verbose = TRUE,
  ...
)

Arguments

x

A SummarizedExperiment object.

assay.type

A single character value for selecting the SummarizedExperiment assay used for random subsampling. Only counts are useful and other transformed data as input will give meaningless output.

assay_name

a single character value for specifying which assay to use for calculation. (Please use assay.type instead. At some point assay_name will be disabled.)

min_size

A single integer value equal to the number of counts being simulated this can equal to lowest number of total counts found in a sample or a user specified number.

replace

Logical Default is TRUE. The default is with replacement (replace=TRUE). See phyloseq::rarefy_even_depth for details on implications of this parameter.

name

A single character value specifying the name of transformed abundance table.

verbose

Logical Default is TRUE. When TRUE an additional message about the random number used is printed.

...

additional arguments not used

Details

Although the subsampling approach is highly debated in microbiome research, we include the subsampleCounts function because there may be some instances where it can be useful. Note that the output of subsampleCounts is not the equivalent as the input and any result have to be verified with the original dataset. To maintain the reproducibility, please define the seed using set.seed() before implement this function.

Value

subsampleCounts return x with subsampled data.

Author(s)

Sudarshan A. Shetty and Felix G.M. Ernst

References

McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014 Apr 3;10(4):e1003531.

Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224

Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017 Dec;5(1):1-8.

Examples

# When samples in TreeSE are less than specified min_size, they will be removed.
# If after subsampling features are not present in any of the samples, 
# they will be removed.
data(GlobalPatterns)
tse <- GlobalPatterns
set.seed(123)
tse.subsampled <- subsampleCounts(tse, 
                                  min_size = 60000, 
                                  name = "subsampled" 
                                  )
tse.subsampled
dim(tse)
dim(tse.subsampled)


microbiome/mia documentation built on April 27, 2024, 4:04 a.m.