HTSFilter: Calculate data-based filtering threshold for replicated...

Description Usage Arguments Details Value Author(s) References Examples

Description

Calculate a data-based filtering threshold for replicated transcriptome sequencing data through the pairwise Jaccard similarity index between pairs of replicates within each experimental condition.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
HTSFilter(x, ...)

## S4 method for signature 'matrix'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'data.frame'
HTSFilter(
  x,
  conds,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam()
)

## S4 method for signature 'DGEList'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEExact'
HTSFilter(
  x,
  DGEList,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGEGLM'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DGELRT'
HTSFilter(
  x,
  DGEGLM,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("TMM", "DESeq", "none"),
  plot = TRUE,
  plot.name = NA,
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

## S4 method for signature 'DESeqDataSet'
HTSFilter(
  x,
  s.min = 1,
  s.max = 200,
  s.len = 100,
  loess.span = 0.3,
  normalization = c("DESeq", "TMM", "none"),
  plot = TRUE,
  plot.name = NA,
  pAdjustMethod = "BH",
  parallel = FALSE,
  BPPARAM = bpparam(),
  conds
)

Arguments

x

A numeric matrix or data.frame representing the counts of dimension (g x n), for g genes in n samples, a DGEList object, a DGEExact object, a DGEGLM object, a DGELRT object, or a DESeqDataSet object.

...

Additional optional arguments

conds

Vector of length n identifying the experimental condition of each of the n samples; required when sQuote(x) is a numeric matrix. In the case of objects of class DGEList, DGEExact, DGEGLM, DGELRT, or DESeqDataSet, the design matrix is automatically

s.min

Minimum value of filtering threshold to be considered, with default value equal to 1

s.max

Maximum value of filtering threshold to be considered, with default value equal to 200

s.len

Length of sequence of filtering thresholds to be considered (from s.min to s.max) for the calculation of the global similarity index

loess.span

Span of the loess curve to be fitted to the filtering thresholds and corresponding global similarity indices, with default value equal to 0.3

normalization

Normalization method to be used to correct for differences in library sizes, with choices “TMM” (Trimmed Mean of M-values), “DESeq” (normalization method proposed in the DESeq package), “pseudo.counts” (pseudo-counts obtained via quantile-quantile normalization in the edgeR package, only available for objects of class DGEList and DGEExact), and “none” (to be used only if user is certain no normalization is required, or if data have already been pre-normalized by an alternative method)

plot

If “TRUE”, produce a plot of the calculated global similarity indices against the filtering threshold with superimposed loess curve

plot.name

If plot = “TRUE”, the name of the PDF file to be saved to the current working directory. If plot.name = NA, the plot is drawn in the current window.

parallel

If FALSE, no parallelization. If TRUE, parallel execution using BiocParallel (see next argument BPPARAM). A note on running in parallel using BiocParallel: it may be advantageous to remove large, unneeded objects from the current R environment before calling the function, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.

BPPARAM

Optional parameter object passed internally to bplapply when parallel=TRUE. If not specified, the parameters last registered with register will be used.

DGEList

Object of class DGEList, to be used when filtering objects of class DGEExact

DGEGLM

Object of class DGEGLM, to be used when filtering objects of class DGELRT

pAdjustMethod

The method used to adjust p-values, see ?p.adjust

Details

The Jaccard similarity index, which measures the overlap of two sets, is calculated as follows. Given two binary vectors, each of length n, we define the following values:

We note that all attributes fall into one of these four quantities, so a+b+c+d=n. Given these quantities, we may calculate the Jaccard similarity index between the two vectors as follows:

J = a/(a+b+c).

Value

Author(s)

Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic

References

R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high- throughput experiments. PNAS 107(21):9546-9551.

P. Jaccard (1901). Etude comparative de la distribution orale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547-549.

A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
library(Biobase)
data("sultan")
conds <- pData(sultan)$cell.line

########################################################################
## Matrix or data.frame
########################################################################

filter <- HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)

########################################################################
## DGEExact
########################################################################

library(edgeR)
dge <- DGEList(counts=exprs(sultan), group=conds)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
et <- HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)


########################################################################
## DESeq2
########################################################################

library(DESeq2)
conds <- gsub(" ", ".", conds)
dds <- DESeqDataSetFromMatrix(countData = exprs(sultan),
                              colData = data.frame(cell.line = conds),
                              design = ~ cell.line)
## Not run:
##
## dds <- DESeq(dds)
## filter <- HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res <- results(filter, independentFiltering=FALSE)

andreamrau/HTSFilter documentation built on Dec. 7, 2020, 2:44 a.m.