discretize: Discretize expression matrix for qualitative biclustering
In rqubic: Qualitative biclustering algorithm for expression data analysis in R

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Performs recursive quantilizations on gene expression data across samples, to quantileDiscretize gene expression matrix. The quantile parameter q determines the estimated proportion of differentially expressed genes (2q as for both up- and down-regulatons). The rank parameter r determines how many discrete levels should differentially expressed genes (or outliers) have. See details below.

1	quantileDiscretize(x, ...)

`x`	It can be an object of the `eSet` class or inheriting it. The most commonly used form is an `linkS4class{ExpressionSet}` class. Alternatively, it can be a numeric matrix.
`...`	Currently, the ... accepts two parameter: `q` and `rank`, explained below.

qEstimated proportion of conditions where gene is up- or down-regulated, value between (0,0.5), default value is set to 0.06. By specifying q one estimates that in 2q of all conditions, the expression value of a gene is considered as outlier.
rankRanks (levels) of outliers, a positive integer, default is 1L. By default, all conditions get one label for each gene in {-1, 0, 1}, representing down expression, not changing and high expression respectively. In case rank>1, the outliers are further divided into rank levels by applying recursive quantilization with equal intervals.

Parameter q corresponds to the command line option -q in the QUBIC command line tool, and the rank option corresponds to -r.

For each gene, the algorithm applies quantile discretization first to divide conditions into negative (lower), un-changed and positive (higher) expressions. Negative and positive expressed conditions are considered as outliers. For outliers in each direction, the algorithm tries to further quantileDiscretize the expression values in case rank>1.

This second discretization step is performed by dividing the sorted outliers into rank tandom groups with equal conditions. A label is assigned to each of these tandom groups, in the following order:

-1, -2, …, -rank

for outliers with negative expression, from the most negative group to the least negative group (not the other way around!).

Similarly, for positive outliers, labels in the order of

rank, rank-1, …, 1

are assigned to tandom groups from the least positive group to the most positive group.

That is, signs of labels indicate the direction of gene expression change, and the absolute value represents the quantileDiscretized rank in the outliers.

An object of the same class as the input parameter, with the exprs slot replaced by the quantileDiscretized matrix, which is a matrix of integer.

Note that the resulting discrete matrix of this implementation can be slighly different from the one used by the QUBIC command line tool.

The main reason for this is the internal data type: while QUBIC uses float to represent expression matrix, we use double to represent the matrix.

It has the advantages of interfacing to R, having higher precision and avoiding errors caused by floating presentation. It is implemented with potential larger costs of memory, however for test data sets (for example the ALL dataset with more than 120 samples and 12000 genes) the peak memory use (<100M) as well as the execution time (CPU time 0.028s) are well under control.

The differentially is especially often observed when there are many tied values. These cases however are very rare cases and we assume they will not affect the results to a large extent.

Jitao David Zhang <jitao_david.zhang@roche.com>

Li et al. (2009) QUBIC: a qualitative biclustering algorithm for analyses of gene expression data Nucleic Acids Research 37:e101

parseQubicChars parses the quantileDiscretized matrix by the QUBIC command line tool into a data frame.

library(Biobase)
data(sample.ExpressionSet, package="Biobase")
sample.disc <- quantileDiscretize(sample.ExpressionSet)
exprs(sample.disc)[1:6, 1:6]

## Equivalent to pass a numeric matrix
sample.mat.disc <- quantileDiscretize(exprs(sample.ExpressionSet))
sample.mat.disc[1:6, 1:6]
## Not run: identical(exprs(sample.disc),sample.mat.disc)

## with multiple ranks
sample.rank3 <- quantileDiscretize(sample.ExpressionSet, rank=3)
exprs(sample.rank3)[1:6, 1:6]

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

                A  B C D  E F
AFFX-MurIL2_at  1  0 1 1  0 0
AFFX-MurIL10_at 0  1 0 0 -1 0
AFFX-MurIL4_at  1 -1 1 0 -1 0
AFFX-MurFAS_at  1 -1 0 0  1 0
AFFX-BioB-5_at  1  0 0 1  0 0
AFFX-BioB-M_at  1 -1 0 0  0 0
                A  B C D  E F
AFFX-MurIL2_at  1  0 1 1  0 0
AFFX-MurIL10_at 0  1 0 0 -1 0
AFFX-MurIL4_at  1 -1 1 0 -1 0
AFFX-MurFAS_at  1 -1 0 0  1 0
AFFX-BioB-5_at  1  0 0 1  0 0
AFFX-BioB-M_at  1 -1 0 0  0 0
                A  B C D  E F
AFFX-MurIL2_at  1  0 1 3  0 0
AFFX-MurIL10_at 0  3 0 0 -3 0
AFFX-MurIL4_at  1 -3 2 0 -1 0
AFFX-MurFAS_at  3 -3 0 0  1 0
AFFX-BioB-5_at  1  0 0 3  0 0
AFFX-BioB-M_at  3 -3 0 0  0 0