discretize: Discretize expression matrix for qualitative biclustering

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Performs recursive quantilizations on gene expression data across samples, to quantileDiscretize gene expression matrix. The quantile parameter q determines the estimated proportion of differentially expressed genes (2q as for both up- and down-regulatons). The rank parameter r determines how many discrete levels should differentially expressed genes (or outliers) have. See details below.

Usage

1

Arguments

x

It can be an object of the eSet class or inheriting it. The most commonly used form is an linkS4class{ExpressionSet} class. Alternatively, it can be a numeric matrix.

...

Currently, the ... accepts two parameter: q and rank, explained below.

Details

Parameter q corresponds to the command line option -q in the QUBIC command line tool, and the rank option corresponds to -r.

For each gene, the algorithm applies quantile discretization first to divide conditions into negative (lower), un-changed and positive (higher) expressions. Negative and positive expressed conditions are considered as outliers. For outliers in each direction, the algorithm tries to further quantileDiscretize the expression values in case rank>1.

This second discretization step is performed by dividing the sorted outliers into rank tandom groups with equal conditions. A label is assigned to each of these tandom groups, in the following order:

-1, -2, …, -rank

for outliers with negative expression, from the most negative group to the least negative group (not the other way around!).

Similarly, for positive outliers, labels in the order of

rank, rank-1, …, 1

are assigned to tandom groups from the least positive group to the most positive group.

That is, signs of labels indicate the direction of gene expression change, and the absolute value represents the quantileDiscretized rank in the outliers.

Value

An object of the same class as the input parameter, with the exprs slot replaced by the quantileDiscretized matrix, which is a matrix of integer.

Note

Note that the resulting discrete matrix of this implementation can be slighly different from the one used by the QUBIC command line tool.

The main reason for this is the internal data type: while QUBIC uses float to represent expression matrix, we use double to represent the matrix.

It has the advantages of interfacing to R, having higher precision and avoiding errors caused by floating presentation. It is implemented with potential larger costs of memory, however for test data sets (for example the ALL dataset with more than 120 samples and 12000 genes) the peak memory use (<100M) as well as the execution time (CPU time 0.028s) are well under control.

The differentially is especially often observed when there are many tied values. These cases however are very rare cases and we assume they will not affect the results to a large extent.

Author(s)

Jitao David Zhang <jitao_david.zhang@roche.com>

References

Li et al. (2009) QUBIC: a qualitative biclustering algorithm for analyses of gene expression data Nucleic Acids Research 37:e101

See Also

parseQubicChars parses the quantileDiscretized matrix by the QUBIC command line tool into a data frame.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(Biobase)
data(sample.ExpressionSet, package="Biobase")
sample.disc <- quantileDiscretize(sample.ExpressionSet)
exprs(sample.disc)[1:6, 1:6]

## Equivalent to pass a numeric matrix
sample.mat.disc <- quantileDiscretize(exprs(sample.ExpressionSet))
sample.mat.disc[1:6, 1:6]
## Not run: identical(exprs(sample.disc),sample.mat.disc)

## with multiple ranks
sample.rank3 <- quantileDiscretize(sample.ExpressionSet, rank=3)
exprs(sample.rank3)[1:6, 1:6]

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

                A  B C D  E F
AFFX-MurIL2_at  1  0 1 1  0 0
AFFX-MurIL10_at 0  1 0 0 -1 0
AFFX-MurIL4_at  1 -1 1 0 -1 0
AFFX-MurFAS_at  1 -1 0 0  1 0
AFFX-BioB-5_at  1  0 0 1  0 0
AFFX-BioB-M_at  1 -1 0 0  0 0
                A  B C D  E F
AFFX-MurIL2_at  1  0 1 1  0 0
AFFX-MurIL10_at 0  1 0 0 -1 0
AFFX-MurIL4_at  1 -1 1 0 -1 0
AFFX-MurFAS_at  1 -1 0 0  1 0
AFFX-BioB-5_at  1  0 0 1  0 0
AFFX-BioB-M_at  1 -1 0 0  0 0
                A  B C D  E F
AFFX-MurIL2_at  1  0 1 3  0 0
AFFX-MurIL10_at 0  3 0 0 -3 0
AFFX-MurIL4_at  1 -3 2 0 -1 0
AFFX-MurFAS_at  3 -3 0 0  1 0
AFFX-BioB-5_at  1  0 0 3  0 0
AFFX-BioB-M_at  3 -3 0 0  0 0

rqubic documentation built on Nov. 8, 2020, 8:20 p.m.