Description Usage Arguments Details Value Author(s) References Examples
Calculate a databased filtering threshold for replicated transcriptome sequencing data through the pairwise Jaccard similarity index between pairs of replicates within each experimental condition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109  HTSFilter(x, ...)
## S4 method for signature 'matrix'
HTSFilter(
x,
conds,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam()
)
## S4 method for signature 'data.frame'
HTSFilter(
x,
conds,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam()
)
## S4 method for signature 'DGEList'
HTSFilter(
x,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam(),
conds
)
## S4 method for signature 'DGEExact'
HTSFilter(
x,
DGEList,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "pseudo.counts", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam(),
conds
)
## S4 method for signature 'DGEGLM'
HTSFilter(
x,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam(),
conds
)
## S4 method for signature 'DGELRT'
HTSFilter(
x,
DGEGLM,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("TMM", "DESeq", "none"),
plot = TRUE,
plot.name = NA,
parallel = FALSE,
BPPARAM = bpparam(),
conds
)
## S4 method for signature 'DESeqDataSet'
HTSFilter(
x,
s.min = 1,
s.max = 200,
s.len = 100,
loess.span = 0.3,
normalization = c("DESeq", "TMM", "none"),
plot = TRUE,
plot.name = NA,
pAdjustMethod = "BH",
parallel = FALSE,
BPPARAM = bpparam(),
conds
)

x 
A numeric matrix or data.frame representing the counts of dimension (g x n),
for g genes in n samples, a 
... 
Additional optional arguments 
conds 
Vector of length n identifying the experimental condition of each of the n samples; required when sQuote(x)
is a numeric matrix. In the case of objects of class 
s.min 
Minimum value of filtering threshold to be considered, with default value equal to 1 
s.max 
Maximum value of filtering threshold to be considered, with default value equal to 200 
s.len 
Length of sequence of filtering thresholds to be considered (from 
loess.span 
Span of the loess curve to be fitted to the filtering thresholds and corresponding global similarity indices, with default value equal to 0.3 
normalization 
Normalization method to be used to correct for differences in library sizes, with
choices “TMM” (Trimmed Mean of Mvalues), “DESeq” (normalization method proposed in the
DESeq package), “pseudo.counts” (pseudocounts obtained via quantilequantile normalization in
the edgeR package, only available for objects of class 
plot 
If “TRUE”, produce a plot of the calculated global similarity indices against the filtering threshold with superimposed loess curve 
plot.name 
If 
parallel 
If 
BPPARAM 
Optional parameter object passed internally to 
DGEList 
Object of class DGEList, to be used when filtering objects of class DGEExact 
DGEGLM 
Object of class DGEGLM, to be used when filtering objects of class DGELRT 
pAdjustMethod 
The method used to adjust pvalues, see 
The Jaccard similarity index, which measures the overlap of two sets, is calculated as follows. Given two binary vectors, each of length n, we define the following values:
a = the number of attributes with a value of 1 in both vectors
b = the number of attributes with a value of 1 in the first vector and 0 in the second
c = the number of attributes with a value of 0 in the first vector and 1 in the second
d = the number of attributes with a value of 0 in both vectors
We note that all attributes fall into one of these four quantities, so a+b+c+d=n. Given these quantities, we may calculate the Jaccard similarity index between the two vectors as follows:
J = a/(a+b+c).
filteredData An object of the same class as x
containing the data that passed the filter
on A binary vector of length g, where 1 indicates a gene with normalized expression
greater than the optimal filtering threshold s.optimal
in at least one sample (irrespective of
condition labels), and 0 indicates a gene with normalized expression less than or equal to the optimal
filtering threshold in all samples
s The optimal filtering threshold as identified by the global similarity index
indexValues A matrix of dimension (s.len
x 2) giving the tested filtering thersholds and the
corresponding global similarity indices. Note that the threshold values are equally spaced on the log
scale, and thus unequally spaced on the count scale (i.e., we test more threshold values at very low levels
of expression, and fewer at very high levels of expression).
normFactor A vector of length n giving the estimated library sizes estimated by the
normalization method specified in normalization
removedData A matrix containing the filtered data
Andrea Rau, Melina Gallopin, Gilles Celeux, and Florence Jaffrezic
R. Bourgon, R. Gentleman, and W. Huber. (2010) Independent filtering increases detection power for high throughput experiments. PNAS 107(21):95469551.
P. Jaccard (1901). Etude comparative de la distribution orale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547549.
A. Rau, M. Gallopin, G. Celeux, F. Jaffrezic (2013). Databased filtering for replicated highthroughput transcriptome sequencing experiments. Bioinformatics, doi: 10.1093/bioinformatics/btt350.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  library(Biobase)
data("sultan")
conds < pData(sultan)$cell.line
########################################################################
## Matrix or data.frame
########################################################################
filter < HTSFilter(exprs(sultan), conds, s.len=25, plot=FALSE)
########################################################################
## DGEExact
########################################################################
library(edgeR)
dge < DGEList(counts=exprs(sultan), group=conds)
dge < calcNormFactors(dge)
dge < estimateCommonDisp(dge)
dge < estimateTagwiseDisp(dge)
et < exactTest(dge)
et < HTSFilter(et, DGEList=dge, s.len=25, plot=FALSE)$filteredData
## topTags(et)
########################################################################
## DESeq2
########################################################################
library(DESeq2)
conds < gsub(" ", ".", conds)
dds < DESeqDataSetFromMatrix(countData = exprs(sultan),
colData = data.frame(cell.line = conds),
design = ~ cell.line)
## Not run:
##
## dds < DESeq(dds)
## filter < HTSFilter(dds, s.len=25, plot=FALSE)$filteredData
## class(filter)
## res < results(filter, independentFiltering=FALSE)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.