ps_dedupe | R Documentation |
Use one or more variables in the sample_data to identify and remove duplicate samples (leaving one sample per group).
methods:
method = "readcount" keeps the one sample in each duplicate group with the highest total number of reads (phyloseq::sample_sums)
method = "first" keeps the first sample in each duplicate group encountered in the row order of the sample_data
method = "last" keeps the last sample in each duplicate group encountered in the row order of the sample_data
method = "random" keeps a random sample from each duplicate group (set.seed for reproducibility)
More than one "duplicate" sample can be kept per group by setting n
samples > 1.
ps_dedupe(
ps,
vars,
method = "readcount",
verbose = TRUE,
n = 1,
.keep_group_var = FALSE,
.keep_readcount = FALSE,
.message_IDs = FALSE,
.label_only = FALSE,
.keep_all_taxa = FALSE
)
ps |
phyloseq object |
vars |
names of variables, whose (combined) levels identify groups from which only 1 sample is desired |
method |
keep sample with max "readcount" or the "first" or "last" or "random" samples encountered in given sample_data order for each duplicate group |
verbose |
message about number of groups, and number of samples dropped? |
n |
number of 'duplicates' to keep per group, defaults to 1 |
.keep_group_var |
keep grouping variable .GROUP. in phyloseq object? |
.keep_readcount |
keep readcount variable .READCOUNT. in phyloseq object? |
.message_IDs |
message sample names of dropped variables? |
.label_only |
if TRUE, the samples will NOT be filtered, just labelled with a new logical variable .KEEP_SAMPLE. |
.keep_all_taxa |
keep all taxa after removing duplicates? If FALSE, the default, taxa are removed if they never occur in any of the retained samples |
What happens when duplicated samples have exactly equal readcounts in method = "readcount"? The first encountered maximum is kept (in sample_data row order, like method = "first")
phyloseq object
ps_filter
for filtering samples by sample_data variables
data("dietswap", package = "microbiome")
dietswap
# let's pretend the dietswap data contains technical replicates from each subject
# we want to keep only one of them
ps_dedupe(dietswap, vars = "subject", method = "readcount", verbose = TRUE)
# contrived example to show identifying "duplicates" via the interaction of multiple columns
ps1 <- ps_dedupe(
ps = dietswap, method = "readcount", verbose = TRUE,
vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps1)
ps2 <- ps_dedupe(
ps = dietswap, method = "first", verbose = TRUE,
vars = c("timepoint", "group", "bmi_group")
)
phyloseq::sample_data(ps2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.