gene_filter: Gene filtering based on heterogeneity

gene_filterR Documentation

Gene filtering based on heterogeneity

Description

This function filters out genes that show a low heterogeneity, as measured by Shannon's entropy.

Usage

## S4 method for signature 'matrix'
gene_filter(
  x,
  from = min(x, na.rm = TRUE),
  to = max(x, na.rm = TRUE),
  nBins = 20,
  heterogeneity_threshold = 0.1
)

## S4 method for signature 'SummarizedExperiment'
gene_filter(
  x,
  from = min(assay(x, awst_values), na.rm = TRUE),
  to = max(assay(x, awst_values), na.rm = TRUE),
  nBins = 20,
  heterogeneity_threshold = 0.1,
  awst_values = "awst"
)

Arguments

x

a matrix of transformed gene expression counts (typically the results of awst).

from

the minimum value from which to start binning data.

to

the maximum value for the binning of the data.

nBins

the number of bins.

heterogeneity_threshold

the trheshold used for the filtering.

awst_values

integer scalar or string indicating the assay that contains the awst-transformed values to use as input.

Details

Shannon's entropy is computed on the categorized data after AWST transformation. Those genes that show a lower entropy than the predefined threshold are deemed to carry too low information to be useful for the classification of the samples, and are hence removed.

Value

if 'x' is a matrix, it returns a filtered matrix. If 'x' is a 'SummarizedExperiment', it returns a filtered 'SummarizedExperiment'

Methods (by class)

  • matrix: the input is a matrix of awst-transformed values.

  • SummarizedExperiment: the input is a SummarizedExperiment with awst-transformed values in one of its assays.

References

Risso and Pagnotta (2019). Within-sample standardization and asymmetric winsorization lead to accurate classification of RNA-seq expression profiles. Manuscript in preparation.

Examples

set.seed(222)
x <- matrix(rpois(75, lambda=5), ncol=5, nrow=15)
a <- awst(x)
gene_filter(a)


drisso/awst documentation built on Jan. 29, 2024, 3:42 p.m.