slsd: Sample level signal denoising

View source: R/slsd.R

slsdR Documentation

Sample level signal denoising

Description

In the low-microbiome biomass setting, real microbes also exhibit a proportional number of total k-mers, number of unique k-mers, as well as number of total assigned sequencing reads across samples; i.e. the following three Spearman correlations are significant when tested using sample-level data provided in Kraken reports: cor(minimizer_len, minimizer_n_unique), cor(minimizer_len, total_reads) and cor(total_reads, minimizer_n_unique). (r1>0 & r2>0 & r3>0 & p1<0.05 & p2<0.05 & p3<0.05).

Usage

slsd(
  kreports,
  method = "spearman",
  ...,
  min_reads = 3L,
  min_minimizer_n_unique = 3L,
  min_number = 3L
)

Arguments

kreports

kreports data returned by prep_dataset() for all samples.

method

A character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman", can be abbreviated.

...

Other arguments passed to cor.test.

min_reads

An integer, the minimal number of the total reads to filter taxa. SAHMI use 2.

min_minimizer_n_unique

An integer, the minimal number of the unique number of minimizer to filter taxa. SAHMI use 2.

min_number

An integer, the minimal number of samples per taxid. SAHMI use 4.

Value

A polars DataFrame of correlation coefficient and pvalue for cor(minimizer_len, minimizer_n_unique) (r1 and p1), cor(minimizer_len, total_reads) (r2 and p2) and cor(total_reads, minimizer_n_unique) (r3 and p3).

Examples

## Not run: 
# `sahmi_datasets` should be the output of all samples from `prep_dataset()`
slsd <- slsd(lapply(sahmi_datasets, `[[`, "kreport"))
real_taxids_slsd <- slsd$filter(
    pl$col("r1")$gt(0),
    pl$col("r2")$gt(0),
    pl$col("r3")$gt(0),
    pl$col("p1")$lt(0.05),
    pl$col("p2")$lt(0.05),
    pl$col("p3")$lt(0.05)
)$get_column("taxid")

## End(Not run)

rsahmi documentation built on April 4, 2025, 1:46 a.m.

Related to slsd in rsahmi...