applySeqGate: Performs a Filtering of Lowly Expressed Features

Description Usage Arguments Details Value Author(s) References Examples

View source: R/seqgate.R

Description

Implements the SeqGate methods to filter lowly expressed features (e.g. genes)

Usage

1
2
    applySeqGate(readCountsSE, assayName, colCond, prop0=NULL, percentile=NULL, 
                propUpThresh=0.9)

Arguments

readCountsSE

SummarizedExperiment object including the assayName assay, that is a matrix of read counts. Rows correspond to features (genes or other genomic features) and columns correspond to sample libraries. The read counts must contain zeros, as the SeqGate method is based on the distribution of counts in replicates along with zero counts.

assayName

character string giving the name of the assay in readCountsSE for which we want to perform the filtering.

colCond

character string giving the name of the column describing the biological conditions to which the samples belong to, in the colData DataFrame of readCountsSE.

prop0

minimal proportion of zeros among a condition to consider that the feature is not or lowly expressed. prop0 must be > 0 and < 1. See details for default setting.

percentile

percentile used on the 'max' distribution to determine the filtering threshold value. percentile must be >= 0 and <=1. See details for default setting.

propUpThresh

proportion of counts to be above the threshold in at least one condition to keep the feature. propUpThresh must be > 0 and <= 1. See details for default setting.

Details

In order to find a theshold value to filter lowly expressed features, SeqGate analyzes the distribution of counts found in replicates along with zero counts. More specifically, features with a proportion of at least prop0 zeros in one condition are selected. The distribution of counts found in replicates of that same condition along with those zeros is computed. The chosen threshold is the count value corresponding to the percentile of this distribution. Finally, features having a proportion propUpThresh of replicates with counts below that value in all conditions are filtered.

If prop0 is not provided, it is set to the number of replicates minus one divided by the max total number of replicates. For example, if one of the condition of the experiment counts 2 replicates and the other condition 4 replicates, the proportion prop0 will be set to (4-1)/4=0.75.

If percentile is not provided, it is set to 0.90 if at least one of the conditions have 5 replicates or less. Otherwise, it is set to a value comprised between 0.5 and 0.9, depending on the number or replicates.

Value

The input SummarizedExperiment with the following added elements:

rowData(readCountsSE)$onFilter

vector which length = nrow(readCountsSE) indicating if features are kept after filtering (value = TRUE) or not (value = FALSE)

metadata(readCountsSE)$threshold

applied filter threshold.

Author(s)

Christelle Reynès christelle.reynes@igf.cnrs.fr,
Stéphanie Rialle stephanie.rialle@mgx.cnrs.fr

References

Rialle, R. et al. (2020): SeqGate: a bioconductor package to perform data-driven filtering of RNAseq datasets (manuscript in preparation)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    # Loading of input data frame
    data(data_MiTF_1000genes)
    # Annotating conditions
    cond<-c("A","A","B","B","A","B")
    # Setting the SummarizedExperiment input
    rowData <- DataFrame(row.names=rownames(data_MiTF_1000genes))
    colData <- DataFrame(Conditions=cond)
    counts_strub <- SummarizedExperiment(
                        assays=list(counts=data_MiTF_1000genes),
                        rowData=rowData,
                        colData=colData)
    # Applying SeqGate
    counts_strub <- applySeqGate(counts_strub,"counts","Conditions")
    # Getting the matrix of kept genes after filtering
    keptGenes <- assay(counts_strub[rowData(counts_strub)$onFilter == TRUE,])

SeqGate documentation built on Jan. 24, 2021, 2:03 a.m.