HTSBasicFilter: Implement basic filters for transcriptome sequencing data.

Description Usage Arguments Details Value Author(s) References Examples

Description

Implement a variety of basic filters for transcriptome sequencing data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
HTSBasicFilter(x, ...)

## S4 method for signature 'matrix'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq", "none"))

## S4 method for signature 'data.frame'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq", "none"))

## S4 method for signature 'DGEList'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq",
  "pseudo.counts", "none"))

## S4 method for signature 'DGEExact'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq",
  "pseudo.counts", "none"))

## S4 method for signature 'DGEGLM'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq", "none"))

## S4 method for signature 'DGELRT'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("TMM", "DESeq", "none"))

## S4 method for signature 'DESeqDataSet'
HTSBasicFilter(x, method, cutoff.type = "value",
  cutoff = 10, length = NA, normalization = c("DESeq", "TMM", "none"),
  pAdjustMethod = "BH")

Arguments

x

A numeric matrix or data.frame representing the counts of dimension (g x n), for g genes in n samples, a DGEList object, a DGEExact object, a DGEGLM object, a DGELRT object, or a DESeqDataSet object.

...

Additional optional arguments

method

Basic filtering method to be used: “mean”, “sum”, “rpkm”, “variance”, “cpm”, “max”, “cpm.mean”, “cpm.sum”, “cpm.variance”, “cpm.max”, “rpkm.mean”, “rpkm.sum”, “rpkm.variance”, or “rpkm.max”

cutoff.type

Type of cutoff to be used: a numeric value indicating the number of samples to be used for filtering (when method = “cpm” or “rpkm”), or one of “value”, “number”, or “quantile”

cutoff

Cutoff to be used for chosen filter

length

Optional vector of length n containing the lengths of each gene in x; optional except in the case of method = “rpkm”

normalization

Normalization method to be used to correct for differences in library sizes, with choices “TMM” (Trimmed Mean of M-values), “DESeq” (normalization method proposed in the DESeq package), “pseudo.counts” (pseudo-counts obtained via quantile-quantile normalization in the edgeR package, only available for objects of class DGEList and DGEExact), and “none” (to be used only if user is certain no normalization is required, or if data have already been pre-normalized by an alternative method)

pAdjustMethod

The method used to adjust p-values, see ?p.adjust

Details

This function implements a basic filter for high-throughput sequencing data for a variety of filter types: mean, sum, RPKM, variance, CPM, maximum, mean CPM values, the sum of CPM values, the variance of CPM values, maximum CPM value, mean RPKM values, the sum of RPKM values, the variance of RPKM values, or the maximum RPKM value. The filtering criteria used may be for a given cutoff value, a number of genes, or a given quantile value.

Value

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

References

R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high- throughput experiments. PNAS 107(21):9546-9551.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line
 
########################################################################
## Matrix or data.frame
########################################################################

## Filter genes with total (sum) normalized gene counts < 10
filter <- HTSBasicFilter(exprs(sultan), method="sum", cutoff.type="value", 
                        cutoff = 10)
                        
                        
########################################################################
## DGEExact
########################################################################

library(edgeR)
## Filter genes with CPM values less than 100 in more than 2 samples
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
filter <- HTSBasicFilter(dge, method="cpm", cutoff.type=2, cutoff=100)

########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                             colData = data.frame(cell.line = conds),
                               design = ~ cell.line)
                             
                             
## Not run: Filter genes with mean normalized gene counts < 40% quantile
## dds <- DESeq(dds)
## filter <- HTSBasicFilter(dds, method="mean", cutoff.type="quantile", 
##	cutoff = 0.4)
## res <- results(filter, independentFiltering=FALSE)

HTSFilter documentation built on Nov. 1, 2018, 3:58 a.m.