filterT: FILTERING GENES BEFORE STATISTICAL ANALYSIS
In ISoLDE: Integrative Statistics of alleLe Dependent Expression

Description Usage Arguments Details Value Note Note Author(s) References Examples

View source: R/filterT.R

Filter lowly expressed genes (or transcripts) according to a data driven threshold, before any statistical analysis. This step is not mandatory but strongly recommended.

1 2	filterT(rawASRcounts, normASRcounts, target, tol_filter = 0, bias)

`rawASRcounts`	the `data.frame` containing raw counts (obtained with the `readRawInput` function or any data.frame following `rawASRcounts` format specifications). Raw count `data.frame` is required when filtering on raw or on normalized data when the normalized data do not contain 0 counts. (For simplicity purpose, we call '0 count' any value of zero in a count file).
`normASRcounts`	the `data.frame` containing normalized counts (obtained with the `readNormInput` function or any data.frame following `normASRcounts` format specifications). We strongly recommend to filter on normalized ASR counts.
`target`	the `data.frame` containing the target meta data (obtained with the `readTarget` function or any data.frame following `target` format specifications).
`tol_filter`	a value between 0 and 100 allowing to introduce tolerance rate into filtering step: if tol_filter = 25 all genes having less than 25% of their counts from at least one parental (or strain) origin below the threshold are selected (the default value 0 means all raw counts from at least one parental (or strain) origin must be above threshold, 100 means that no filtering is applied).
`bias`	The kind of allele expression bias you want to study. It must be one of “parental” or “strain”.

Filtering in statistical analysis is recommended to avoid considering genes (or transcript) without enough information, and thus to avoid a too strong effect of multiple test correction.

The aim of our filtering method is to eliminate from analysis not enough quantified genes, that is genes having mostly counts of 0 or near 0 for each replicate in at least one condition (parent, strain). In this purpose, the filterT function searches for the distribution of counts of a gene in a condition when most of read counts are 0 for this condition. This distribution allows to define a threshold. Hence, genes having less counts than this threshold are eliminated.

The filtering step is not mandatory but strongly recommended.

A list of two data.frame:

`filteredASRcounts`	This `data.frame` contains ASR counts that have successfully passed the filtering step.
`removedASRcounts`	This `data.frame` contains ASR counts that have been removed by the filtering step.

Each line represents a feature (e.g. a gene, transcript). Each column represents the number of allele-specific sens reads from either the paternal or maternal parent for a given biological replicate, so that you expect to have two columns per biological replicate.

filterT output on normalized data is the typical input for isolde_test.

A minimal filtering step will always be performed while applying the isolde_test function. It consists of eliminating all genes not satisfying these two conditions:
- At least one of the two medians (of paternal or maternal ASR counts) is different from 0;
- There is at least one ASR count (different from 0) in each cross.

Marine Rohmer marine.rohmer@mgx.cnrs.fr,
Christelle Reynès christelle.reynes@igf.cnrs.fr

Reynès, C. et al. (2016): ISoLDE: a new method for identification of allelic imbalance. Submitted

# Loading all required data.frames
data(rawASRcounts)
data(normASRcounts)
data(target)

# Filtering genes from the ASR count data.frame in parental bias study
res_filterT <- filterT(rawASRcounts = rawASRcounts,
                       normASRcounts = normASRcounts,
                       target = target, bias="parental")
filteredASRcounts <- res_filterT$filteredASRcounts
removedASRcounts <- res_filterT$removedASRcounts