estimate_saturation: Estimate saturation of genes based on rarefaction of reads
In BenaroyaResearch/RNAseQC: Helper functions for conducting quality control of RNAseq data

estimate_saturation

R Documentation

Estimate saturation of genes based on rarefaction of reads

Description

Estimate the saturation of gene detection based on rarefaction of the mapped read counts from each library in a read counts object. This function takes the read counts for each library and sequentially rarefies them at different levels to determine how thoroughly genes are being sampled. Optional settings include the minimum number of counts for a gene to be counted as "detected" (default=1), and if using the sampling method, the number of intermediate points to sample (default=6) and the number of times to sample at each depth (default=5).

Usage

estimate_saturation(
  counts,
  max_reads = Inf,
  method = "sampling",
  ndepths = 6,
  nreps = 5,
  min_counts = 1,
  min_cpm = NULL,
  verbose = FALSE
)

Arguments

`counts`	a numeric matrix (or object that can be coerced to a matrix) containing read counts, or an object from which counts can be extracted. Should have genes in rows and samples in columns.
`max_reads`	the maximum number of reads to sample at. By default, this value is the maximum of total read counts across all libraries.
`method`	character, either "division" or "sampling". Method "sampling" is slower but more realistic, and yields smoother curves. Method "division" is faster but more coarse and less realistic. See Details for more complete description
`ndepths`	the number of depths to sample at. 0 is always included.
`nreps`	the number of sampling iterations to take for each library at each depth. With well-sampled libraries, 1 should be sufficient. With poorly-sampled libraries, sampling variance may be substantial, requiring higher values.
`min_counts`	the minimum number of counts for a gene to be counted as detected. Genes with sample counts >= this value are considered detected. Defaults to 1. Set to NULL to use min_cpm.
`min_cpm`	the minimum counts per million for a gene to be counted as detected. Only relevant with `method = "sampling"`. Genes with sample cpm >= this value are considered detected. Either this or min_count should be specified, but not both; including both yields an error, as does specifying min_cpm with `method = "division"`. Defaults to NULL.
`verbose`	logical, whether to output the status of the estimation.

Details

The method parameter determines the approach used to estimate the number of genes detected at different sequencing depths. Method "division" simply divides the counts for each gene by a series of scaling factors, then counts the genes whose adjusted counts exceed the detection threshold. Method "sampling" generates a number of sets (nreps) of simulated counts for each library at each sequencing depth, by probabilistically simulating counts using observed proportions. It then counts the number of genes that meet the detection threshold in each simulation, and takes the arithmetic mean of the values for each library at each depth.

Value

A data frame containing ndepths x nSamples rows, with one row for each sample at each depth. Columns include "sample" (the name of the sample identifier), "depth" (the depth value for that iteration), and "sat" (the number of genes detected at that depth for that sample). For method "sampling", it includes an additional column with the variance of genes detected across all replicates of each sample at each depth.

BenaroyaResearch/RNAseQC documentation built on Dec. 12, 2024, 8:13 p.m.

BenaroyaResearch/RNAseQC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BenaroyaResearch/RNAseQC
Helper functions for conducting quality control of RNAseq data

estimate_saturation: Estimate saturation of genes based on rarefaction of reads
In BenaroyaResearch/RNAseQC: Helper functions for conducting quality control of RNAseq data

Estimate saturation of genes based on rarefaction of reads

Description

Usage

Arguments

Details

Value

Related to estimate_saturation in BenaroyaResearch/RNAseQC...

R Package Documentation

Browse R Packages

We want your feedback!

BenaroyaResearch/RNAseQC Helper functions for conducting quality control of RNAseq data

estimate_saturation: Estimate saturation of genes based on rarefaction of reads In BenaroyaResearch/RNAseQC: Helper functions for conducting quality control of RNAseq data

Estimate saturation of genes based on rarefaction of reads

Description

Usage

Arguments

Details

Value

Related to estimate_saturation in BenaroyaResearch/RNAseQC...

R Package Documentation

Browse R Packages

We want your feedback!

BenaroyaResearch/RNAseQC
Helper functions for conducting quality control of RNAseq data

estimate_saturation: Estimate saturation of genes based on rarefaction of reads
In BenaroyaResearch/RNAseQC: Helper functions for conducting quality control of RNAseq data