remove_contaminants: Identifying contaminants and false positives taxa (cell line...

View source: R/contaminants.R

remove_contaminantsR Documentation

Identifying contaminants and false positives taxa (cell line quantile test)

Description

Identifying contaminants and false positives taxa (cell line quantile test)

Usage

remove_contaminants(
  kraken_reports,
  study = "current study",
  taxon = c("d__Bacteria", "d__Fungi", "d__Viruses"),
  quantile = 0.95,
  alpha = 0.05,
  alternative = "greater",
  exclusive = FALSE
)

Arguments

kraken_reports

A character of path to all kraken report files.

study

A string of the study name, used to differentiate with cell line data.

taxon

An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")

quantile

Probabilities with values in ⁠[0, 1]⁠ specifying the quantile to calculate.

alpha

Level of significance.

alternative

A string specifying the alternative hypothesis, must be one of "two.sided", "greater" (default) or "less". You can specify just the initial letter.

exclusive

A boolean value, indicates whether taxa not found in celllines data should be regarded as truly. Default: FALSE.

Value

A polars DataFrame with following attributes:

  1. pvalues: Quantile test pvalue.

  2. exclusive: taxids in current study but not found in cellline data.

  3. significant: significant taxids with pvalues < alpha.

  4. truly: truly taxids based on alpha and exclusive. If exclusive is TRUE, this should be the union of exclusive and significant, otherwise, this should be the same with significant.

Examples

## Not run: 
# `paths` should be the output directory for each sample from
# `blit::kraken2()`
truly_microbe <- remove_contaminants(
    kraken_reports = file.path(paths, "kraken_report.txt"),
    quantile = 0.99, exclusive = FALSE
)
microbe_for_plot <- attr(truly_microbe, "truly")[
    order(attr(truly_microbe, "pvalue")[attr(truly_microbe, "truly")])
]
microbe_for_plot <- microbe_for_plot[
    !microbe_for_plot %in% attr(truly_microbe, "exclusive")
]
ggplot(
    truly_microbe$filter(pl$col("taxid")$is_in(microbe_for_plot))$
        to_data_frame(),
    aes(rpmm),
) +
    geom_density(aes(fill = study), alpha = 0.5) +
    scale_x_log10() +
    facet_wrap(facets = vars(taxa), scales = "free") +
    theme(
        strip.clip = "off",
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "inside",
        legend.position.inside = c(1, 0),
        legend.justification.inside = c(1, 0)
    )

## End(Not run)

rsahmi documentation built on April 4, 2025, 1:46 a.m.