Extraction of Biclusters

Description

extractBic: R implementation of extractBic.

Usage

1
extractBic(fact,thresZ=0.5,thresL=NULL)

Arguments

fact

object of the class Factorization.

thresZ

threshold for sample belonging to bicluster; default 0.5.

thresL

threshold for loading belonging to bicluster (if not given it is estimated).

Details

Essentially the model is the sum of outer products of vectors:

X = ∑_{i=1}^{p} λ_i z_i^T + U

where the number of summands p is the number of biclusters. The matrix factorization is

X = L Z + U

Here λ_i are from R^n, z_i from R^l, L from R^{n \times p}, Z from R^{p \times l}, and X, U from R^{n \times l}.

U is the Gaussian noise with a diagonal covariance matrix which entries are given by Psi.

The Z is locally approximated by a Gaussian with inverse variance given by lapla.

Using these values we can computer for each j the variance z_i given x_j. Here

x_j = L z_j + u_j

This variance can be used to determine the information content of a bicluster.

The λ_i and z_i are used to extract the bicluster i, where a threshold determines which observations and which samples belong the the bicluster.

In bic the biclusters are extracted according to the largest absolute values of the component i, i.e. the largest values of λ_i and the largest values of z_i . The factors z_i are normalized to variance 1.

The components of bic are binp, bixv, bixn, biypv, and biypn.

binp give the size of the bicluster: number observations and number samples. bixv gives the values of the extracted observations that have absolute values above a threshold. They are sorted. bixn gives the extracted observation names (e.g. gene names). biypv gives the values of the extracted samples that have absolute values above a threshold. They are sorted. biypn gives the names of the extracted samples (e.g. sample names).

In bicopp the opposite clusters to the biclusters are given. Opposite means that the negative pattern is present.

The components of opposite clusters bicopp are binn, bixv, bixn, biypnv, and biynn.

binp give the size of the opposite bicluster: number observations and number samples. bixv gives the values of the extracted observations that have absolute values above a threshold. They are sorted. bixn gives the extracted observation names (e.g. gene names). biynv gives the values of the opposite extracted samples that have absolute values above a threshold. They are sorted. biynn gives the names of the opposite extracted samples (e.g. sample names).

That means the samples are divided into two groups where one group shows large positive values and the other group has negative values with large absolute values. That means a observation pattern can be switched on or switched off relative to the average value.

numn gives the indices of bic with components: numng = bix and numnp = biypn.

numn gives the indices of bicopp with components: numng = bix and numnn = biynn.

Implementation in R.

Value

bic

extracted biclusters.

numn

indexes for the extracted biclusters.

bicopp

extracted opposite biclusters.

numnopp

indexes for the extracted opposite biclusters.

X

scaled and centered data matrix.

np

number of biclusters.

Author(s)

Sepp Hochreiter

See Also

fabia, fabias, fabiap, fabi, fabiasp, mfsc, nmfdiv, nmfeu, nmfsc, extractPlot, extractBic, plotBicluster, Factorization, projFuncPos, projFunc, estimateMode, makeFabiaData, makeFabiaDataBlocks, makeFabiaDataPos, makeFabiaDataBlocksPos, matrixImagePlot, fabiaDemo, fabiaVersion

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#---------------
# TEST
#---------------

dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]


resEx <- fabia(X,3,0.01,20)

rEx <- extractBic(resEx)

rEx$bic[1,]
rEx$bic[2,]
rEx$bic[3,]


## Not run: 
#---------------
# DEMO1
#---------------

dat <- makeFabiaDataBlocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]


resToy <- fabia(X,13,0.01,200)

rToy <- extractBic(resToy)

avini(resToy)

rToy$bic[1,]
rToy$bic[2,]
rToy$bic[3,]

#---------------
# DEMO2
#---------------


avail <- require(fabiaData)

if (!avail) {
    message("")
    message("")
    message("#####################################################")
    message("Package 'fabiaData' is not available: please install.")
    message("#####################################################")
} else {

data(Breast_A)

X <- as.matrix(XBreast)

resBreast <- fabia(X,5,0.1,200)

rBreast <- extractBic(resBreast)

avini(resBreast)

rBreast$bic[1,]
rBreast$bic[2,]
rBreast$bic[3,]
}


## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.