Filter: Filter genes

Description Usage Arguments Details Value Author(s) Examples

View source: R/filter.R

Description

Filter genes with low means and low variances.

Usage

1
Filter(datasets, data.type, del.perc = c(0.3, 0.3), threshold = 1)

Arguments

datasets

a list of gene expression matrice. Each matrix is for one study. Each row of the matrix is for one gene and each column is for one sample. The row names are gene symbols.

data.type

a character string to specify the type of data in datasets. It should be "microarray", "RNAseq-FPKM", or "RNAseq-count".

del.perc

a numeric vector with two elements, which specify the percentage of genes to be filtered in the two sequential steps of gene filtering when data.type is "microarray" or "RNAseq-FPKM". The default is c(0.3, 0.3). See Details.

threshold

a numeric value to specify the threshold when data.type is "RNAseq-count". The default is 1. See details.

Details

When data.type is "microarray" or "RNAseq-FPKM", two sequential steps of gene filtering are performed. In the first step, the genes with very low expressions are filtered out. These genes are identified with small average expression values across studies. Specifically, mean intensities of each gene across all samples in each study are calculated and the corresponding ranks are obtained. The sum of such ranks across studies of each gene is calculated and genes with the lowest del.perc[1] percent rank sum are considered un-expressed genes (i.e. small expression intensities) and filtered out. Similarly, in the second step, the non-informative (small variation) genes are filtered out by replacing mean intensity in the first step with standard deviation. Genes with the lowest del.perc[2] percent rank sum of standard deviations are filtered out.

When data.type is "RNAseq-count", the genes with very low counts are filtered out. These genes are identified with minimum of mean counts across studies.

Value

A list of gene expression matrice after filtering. Each matrix is for one study. Each row of the matrix is for one gene and each column is for one sample. The row names are gene symbols.

Author(s)

Lin Wang, Schwannden Kuo

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data(datasets.eg)
data(preproc.option)
SinglePreproc <- function(x) {
  x <- Annotate(dataset=x, id.type = "ProbeID", platform=PLATFORM.hgu133plus2)
  x <- Impute(dataset=x)
  x <- PoolReplicate(dataset=x)
}
datasets.eg <- lapply(datasets.eg, SinglePreproc)
datasets.eg <- Merge(datasets=datasets.eg)
# Filter for matrix
res <- Filter(datasets=datasets.eg, data.type=DTYPE.microarray, del.perc=c(0.3, 0.2))
# Filter for Study
study <- new("Study", name="test", dtype=DTYPE.microarray, datasets=datasets.eg)
res <- Filter(datasets=study, data.type=DTYPE.microarray, del.perc=c(0.3, 0.2))

metaOmics/preproc documentation built on May 29, 2019, 4:43 a.m.