Description Usage Arguments Details Value Author(s) References Examples
Implements the SeqGate methods to filter lowly expressed features (e.g. genes)
1 2 | applySeqGate(readCountsSE, assayName, colCond, prop0=NULL, percentile=NULL,
propUpThresh=0.9)
|
readCountsSE |
|
assayName |
character string giving the name of the assay in |
colCond |
character string giving the name of the column describing the biological
conditions to which the samples belong to, in the colData DataFrame
of |
prop0 |
minimal proportion of zeros among a condition to consider that the
feature is not or lowly expressed. |
percentile |
percentile used on the 'max' distribution to determine the filtering
threshold value. |
propUpThresh |
proportion of counts to be above the threshold in at least one condition
to keep the feature. |
In order to find a theshold value to filter lowly expressed features,
SeqGate analyzes the distribution of counts found in replicates along with
zero counts. More specifically, features with a proportion of at least
prop0
zeros in one condition are selected. The distribution of counts
found in replicates of that same condition along with those zeros is
computed. The chosen threshold is the count value corresponding to the
percentile
of this distribution. Finally, features having a
proportion propUpThresh
of replicates with counts below that value
in all conditions are filtered.
If prop0
is not provided, it is set to the number of replicates minus
one divided by the max total number of replicates. For example, if one of
the condition of the experiment counts 2 replicates and the other condition
4 replicates, the proportion prop0
will be set to (4-1)/4=0.75.
If percentile
is not provided, it is set to 0.90 if at least one of
the conditions have 5 replicates or less. Otherwise, it is set to a value
comprised between 0.5 and 0.9, depending on the number or replicates.
The input SummarizedExperiment
with the following added elements:
rowData(readCountsSE)$onFilter |
vector which |
metadata(readCountsSE)$threshold |
applied filter threshold. |
Christelle Reynès christelle.reynes@igf.cnrs.fr,
Stéphanie Rialle stephanie.rialle@mgx.cnrs.fr
Rialle, R. et al. (2020): SeqGate: a bioconductor package to perform data-driven filtering of RNAseq datasets (manuscript in preparation)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Loading of input data frame
data(data_MiTF_1000genes)
# Annotating conditions
cond<-c("A","A","B","B","A","B")
# Setting the SummarizedExperiment input
rowData <- DataFrame(row.names=rownames(data_MiTF_1000genes))
colData <- DataFrame(Conditions=cond)
counts_strub <- SummarizedExperiment(
assays=list(counts=data_MiTF_1000genes),
rowData=rowData,
colData=colData)
# Applying SeqGate
counts_strub <- applySeqGate(counts_strub,"counts","Conditions")
# Getting the matrix of kept genes after filtering
keptGenes <- assay(counts_strub[rowData(counts_strub)$onFilter == TRUE,])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.