filterT: FILTERING GENES BEFORE STATISTICAL ANALYSIS

Description Usage Arguments Details Value Note Note Author(s) References Examples

View source: R/filterT.R

Description

Filter lowly expressed genes (or transcripts) according to a data driven threshold, before any statistical analysis. This step is not mandatory but strongly recommended.

Usage

1
2
filterT(rawASRcounts, normASRcounts, target, tol_filter = 0,
bias)

Arguments

rawASRcounts

the data.frame containing raw counts (obtained with the readRawInput function or any data.frame following rawASRcounts format specifications).
Raw count data.frame is required when filtering on raw or on normalized data when the normalized data do not contain 0 counts. (For simplicity purpose, we call '0 count' any value of zero in a count file).

normASRcounts

the data.frame containing normalized counts (obtained with the
readNormInput function or any data.frame following normASRcounts format specifications).
We strongly recommend to filter on normalized ASR counts.

target

the data.frame containing the target meta data (obtained with the
readTarget function or any data.frame following target format specifications).

tol_filter

a value between 0 and 100 allowing to introduce tolerance rate into filtering step:
if tol_filter = 25 all genes having less than 25% of their counts from at least one parental (or strain) origin below the threshold are selected (the default value 0 means all raw counts from at least one parental (or strain) origin must be above threshold, 100 means that no filtering is applied).

bias

The kind of allele expression bias you want to study. It must be one of “parental” or “strain”.

Details

Filtering in statistical analysis is recommended to avoid considering genes (or transcript) without enough information, and thus to avoid a too strong effect of multiple test correction.

The aim of our filtering method is to eliminate from analysis not enough quantified genes, that is genes having mostly counts of 0 or near 0 for each replicate in at least one condition (parent, strain). In this purpose, the filterT function searches for the distribution of counts of a gene in a condition when most of read counts are 0 for this condition. This distribution allows to define a threshold. Hence, genes having less counts than this threshold are eliminated.

The filtering step is not mandatory but strongly recommended.

Value

A list of two data.frame:

filteredASRcounts

This data.frame contains ASR counts that have successfully passed the filtering step.

removedASRcounts

This data.frame contains ASR counts that have been removed by the filtering step.



Each line represents a feature (e.g. a gene, transcript). Each column represents the number of allele-specific sens reads from either the paternal or maternal parent for a given biological replicate, so that you expect to have two columns per biological replicate.

Note

filterT output on normalized data is the typical input for isolde_test.

Note

A minimal filtering step will always be performed while applying the isolde_test function. It consists of eliminating all genes not satisfying these two conditions:
- At least one of the two medians (of paternal or maternal ASR counts) is different from 0;
- There is at least one ASR count (different from 0) in each cross.

Author(s)

Marine Rohmer marine.rohmer@mgx.cnrs.fr,
Christelle Reynès christelle.reynes@igf.cnrs.fr

References

Reynès, C. et al. (2016): ISoLDE: a new method for identification of allelic imbalance. Submitted

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Loading all required data.frames
data(rawASRcounts)
data(normASRcounts)
data(target)

# Filtering genes from the ASR count data.frame in parental bias study
res_filterT <- filterT(rawASRcounts = rawASRcounts,
                       normASRcounts = normASRcounts,
                       target = target, bias="parental")
filteredASRcounts <- res_filterT$filteredASRcounts
removedASRcounts <- res_filterT$removedASRcounts

ISoLDE documentation built on Jan. 10, 2021, 2:01 a.m.