Factorization instances

Share:

Description

Factorization is a class to store results of matrix factorization algorithms. It has been designed for biclustering but can be used for "principal component analysis", "singular value decomposition", "independent component analysis", "factor analysis", and "non-negative matrix factorization".

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## S4 method for signature 'Factorization'
plot(x, Rm=NULL, Cm=NULL, dim = c(1, 2),
    zoom = rep(1, 2), col.group = NULL,
    colors = c("orange1", "red", rainbow(length(unique(col.group)),
               start=2/6, end=4/6)),
    col.areas = TRUE, col.symbols = c(1, rep(2, length(unique(col.group)))),
    sampleNames = TRUE, rot = rep(-1, length(dim)),
    labels = NULL, label.tol = 0.1, lab.size = 0.725, col.size = 10,
    row.size = 10, do.smoothScatter = FALSE, 
    do.plot = TRUE, ... )

## S4 method for signature 'Factorization'
show(object)

## S4 method for signature 'Factorization'
showSelected(object, which=c(1,2,3,4))

## S4 method for signature 'Factorization'
summary(object, ...) 

Arguments

PLOT:

x

object of the class Factorization.

Rm

row weighting vector. If NULL, it defaults to rep(1,nrow(L(x))).

Cm

column weighting vector. If NULL, it defaults to rep(1,ncol(Z(x))).

dim

optional principal factors that are plotted along the horizontal and vertical axis. Defaults to c(1,2).

zoom

optional zoom factor for row and column items. Defaults to c(1,1).

col.group

optional vector (character or numeric) indicating the different groupings of the columns. Defaults to 1.

colors

vector specifying the colors for the annotation of the plot; the first two elements concern the rows; the third till the last element concern the columns; the first element will be used to color the unlabeled rows; the second element for the labeled rows and the remaining elements to give different colors to different groups of columns. Defaults to c("orange1", "red", rainbow(length(unique(col.group)), start=2/6, end=4/6)).

col.areas

logical value indicating whether columns should be plotted as squares with areas proportional to their marginal mean and colors representing the different groups (TRUE), or with symbols representing the groupings and identical size (FALSE). Defaults to TRUE.

col.symbols

vector of symbols when col.areas=FALSE corresponds to the pch argument of the function plot. Defaults to c(1, rep(2, length(unique(col.group)))).

sampleNames

either a logical vector of length one or a character vector of length equal to the number of samples in the dataset. If a logical is provided, sample names will be displayed on the plot (TRUE; default) or not (FALSE); if a character vector is provided, the names provided will be used to label the samples instead of the default column names.

rot

rotation of plot. Defaults to c(-1,-1).

labels

character vector to be used for labeling points on the graph; if NULL (default), the row names of x are used instead.

label.tol

numerical value specifying either the percentile (label.tol<=1) of rows or the number of rows (label.tol>1) most distant from the plot-center (0,0) that are labeled and are plotted as circles with area proportional to the marginal means of the original data. Defaults to 1.

lab.size

size of identifying labels for row- and column-items as cex parameter of the text function. Defaults to 0.725.

col.size

size of the column symbols in mm. Defaults to 10.

row.size

size of the row symbols in mm. Defaults to 10.

do.smoothScatter

use smoothScatter or not instead of plotting individual points. Defaults to FALSE.

do.plot

produce a plot or not. Defaults to TRUE.

...

further arguments are passed on to eqscaleplotLoc which draws the canvas for the plot; useful for adding a main or a custom sub.

SHOW:

object

An instance of Factorization-class .

SHOWSELECTED:

see object at show.

which

used to provide a list of which plots should be generated: 1=the information content of biclusters, 2=the information content of samples, 3=the loadings per bicluster, 4=the factors per bicluster, default c(1,2,3,4).

SUMMARY:

see object at show.

... further arguments.

Details

Plot

Produces a biplot of a matrix factorization result stored in an instance of the Factorization class.

The function plot is based on the function plot.mpm in the R package mpm (Version: 1.0-16, Date: 2009-08-26, Title: Multivariate Projection Methods, Maintainer: Tobias Verbeke <tobias.verbeke@openanalytics.be>, Author: Luc Wouters <wouters_luc@telenet.be>).

Biclusters are found by sparse factor analysis where both the factors and the loadings are sparse.

Essentially the model is the sum of outer products of vectors:

X = ∑_{i=1}^{p} λ_i z_i^T + U

where the number of summands p is the number of biclusters. The matrix factorization is

X = L Z + U

Here λ_i are from R^n, z_i from R^l, L from R^{n \times p}, Z from R^{p \times l}, and X, U from R^{n \times l}.

For noise free projection like independent component analysis we set the noise term to zero: U=0.

The argument label.tol can be used to select the most informative rows, i.e. rows that are most distant from the center of the plot (smaller 1: percentage of rows, larger 1: number of rows).

Only these row-items are then labeled and represented as circles with their areas proportional to the row weighting.

If the column-items are grouped these groups can be visualized by colors given by col.group.

Show

Statistics of a matrix factorization result stored in an instance of the Factorization class.

This function supplies statistics on a matrix factorization result which is stored as an instance of Factorization-class.

The following is plotted:

  1. the information content of biclusters.

  2. the information content of samples.

  3. the loadings per bicluster.

  4. the factors per bicluster.

ShowSelected

Lists selected statistics of a matrix factorization result stored in an instance of the Factorization class.

This function supplies selected statistics on a matrix factorization result which is stored as an instance of Factorization-class.

The following is plotted depending on the display selection variable which:

  1. the information content of biclusters.

  2. the information content of samples.

  3. the loadings per bicluster.

  4. the factors per bicluster.

Summary

Summary of matrix factorization result stored in an instance of the Factorization class.

This function gives information on a matrix factorization result which is stored as an instance of Factorization-class.

The summary consists of following items:

  1. the number or rows and columns of the original matrix.

  2. the number of clusters for rows and columns is given.

  3. for the row cluster the information content is given.

  4. for each column its information is given.

  5. for each column cluster a summary is given.

  6. for each row cluster a summary is given.

Value

FACTORIZATION:

An instance of Factorization-class .

PLOT:

Rows

a list with the X and Y coordinates of the rows and an indication Select of whether the row was selected according to label.tol.

Columns

a list with the X and Y coordinates of the columns.

SHOW:

no value.

SHOWSELECTED:

no value.

SUMMARY:

no value.

Slots

Objects of class Factorization have the following slots:

parameters:

Saves parameters of the factorization method in a list: ("method","number of cycles","sparseness weight","sparseness prior for loadings","sparseness prior for factors","number biclusters","projection sparseness loadings", "projection sparseness factors","initialization range","are loadings rescaled after each iterations","normalization = scaling of rows","centering method of rows","parameter for method").

n:

number of rows, left dimension.

p1:

right dimension of left matrix.

p2:

left dimension of right matrix.

l:

number of columns, right dimension.

center:

vector of the centers.

scaleData:

vector of the scaling factors.

X:

centered and scaled data matrix n x l.

L:

left matrix n x p1.

Z:

right matrix p2 x l.

M:

middle matrix p1 x p2.

LZ:

matrix L x M x Z.

U:

noise matrix.

avini:

information of each bicluster, vector of length p2.

xavini:

information extracted from each sample, vector of length l.

ini:

information of each bicluster in each sample, matrix p2 x l.

Psi:

noise variance per row, vector of length n.

lapla:

prior information for each sample, vector of length l.

Constructor

Constructor of class Factorization.

Factorization(parameters=list(),n=1,p1=1,p2=1,l=1,center=as.vector(1),scaleData=as.vector(1),X=as.matrix(1),L=as.matrix(1),Z=as.matrix(1),M=as.matrix(1),LZ=as.matrix(1),U=as.matrix(1),avini=as.vector(1),xavini=as.vector(1),ini=as.matrix(1),Psi=as.vector(1),lapla=as.matrix(1))

Accessors

In the following x denotes a Factorization object.

parameters(x), parameters(x) <- value: Returns or sets parameters, where the return value and value are both an instance of list. Parameters of the factorization method are stored in a list: ("method","number of cycles","sparseness weight","sparseness prior for loadings","sparseness prior for factors","number biclusters","projection sparseness loadings", "projection sparseness factors","initialization range","are loadings rescaled after each iterations","normalization = scaling of rows","centering method of rows","parameter for method").

n(x), n(x) <- value: Returns or sets n, where the return value and value are both an instance of numeric. Number of rows, left dimension.

p1(x), p1(x) <- value: Returns or sets p1, where the return value and value are both an instance of numeric. Right dimension of left matrix

p2(x), p2(x) <- value: Returns or sets p2, where the return value and value are both an instance of numeric. Left dimension of right matrix.

l(x), l(x) <- value: Returns or sets l, where the return value and value are both an instance of numeric. Number of columns, right dimension.

center(x), center(x) <- value: Returns or sets center, where the return value and value are both an instance of numeric. Vector of the centers.

scaleData(x), scaleData(x) <- value: Returns or sets scaleData, where the return value and value are both an instance of numeric. Vector of the scaling factors.

X(x), X(x) <- value: Returns or sets X, where the return value and value are both an instance of matrix. Centered and scaled data matrix n x l.

L(x), L(x) <- value: Returns or sets L, where the return value and value are both an instance of matrix. Left matrix n x p1.

Z(x), Z(x) <- value: Returns or sets Z, where the return value and value are both an instance of matrix. Right matrix p2 x l.

M(x), M(x) <- value: Returns or sets M, where the return value and value are both an instance of matrix. Middle matrix p1 x p2.

LZ(x), LZ(x) <- value: Returns or sets LZ, where the return value and value are both an instance of matrix. Matrix L x M x Z.

U(x), U(x) <- value: Returns or sets U, where the return value and value are both an instance of matrix. Noise matrix.

avini(x), avini(x) <- value: Returns or sets avini, where the return value and value are both an instance of numeric. Information of each bicluster, vector of length p2.

xavini(x), xavini(x) <- value: Returns or sets xavini, where the return value and value are both an instance of numeric. Information extracted from each sample, vector of length l.

ini(x), ini(x) <- value: Returns or sets ini, where the return value and value are both an instance of matrix. Information of each bicluster in each sample, matrix p2 x l.

Psi(x), Psi(x) <- value: Returns or sets Psi, where the return value and value are both an instance of numeric. Noise variance per row, vector of length n.

lapla(x), lapla(x) <- value: Returns or sets lapla, where the return value and value are both an instance of matrix. Prior information for each sample, vector of length l.

Signatures

plot

signature(x = "Factorization", y = "missing")

Plot of a matrix factorization result

show

signature(object = "Factorization")

Display statistics of a matrix factorization result

showSelected

signature(object = "Factorization", which = "numeric")

Display particular statistics of a matrix factorization result

summary

signature(object = "Factorization")

Summary of matrix factorization result

Functions that return objects of this class

Factorization objects are returned by fabia, fabias, fabiap, fabiasp, mfsc, nmfsc, nmfdiv, and nmfeu.

Extension to store results of other methods

The class Factorization may contain the result of different matrix factorization methods. The methods may be generative or not.

Methods my be "singular value decomposition" (M contains singular values as well as avini, L and Z are orthonormal matrices), "independent component analysis" (Z contains the projection/sources, L is the mixing matrix, M is unity), "factor analysis" (Z contains factors, L the loadings, M is unity, U the noise, Psi the noise covariance, lapla is a variational parameter for non-Gaussian factors, avini and ini are the information the factors convey about the observations).

Author(s)

Sepp Hochreiter

See Also

fabia, fabias, fabiap, fabi, fabiasp, mfsc, nmfdiv, nmfeu, nmfsc, extractPlot, extractBic, plotBicluster, Factorization, projFuncPos, projFunc, estimateMode, makeFabiaData, makeFabiaDataBlocks, makeFabiaDataPos, makeFabiaDataBlocksPos, matrixImagePlot, fabiaDemo, fabiaVersion

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
###################
# TEST
###################


#------------------
#   PLOT
#------------------



n=200
l=100
p=4

dat <- makeFabiaDataBlocks(n = n,l= l,p = p,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
ZC <- dat[[3]]
LC <- dat[[4]]


resEx <- fabia(X,p,0.01,400)


gclab <- rep.int(0,l)
gllab <- rep.int(0,n)
clab <- as.character(1:l)
llab <- as.character(1:n)
for (i in 1:p){
 for (j in ZC[i]){
     clab[j] <- paste(as.character(i),"_",clab[j],sep="")
 }
 for (j in LC[i]){
     llab[j] <- paste(as.character(i),"_",llab[j],sep="")
 }
 gclab[unlist(ZC[i])] <- gclab[unlist(ZC[i])] + p^i
 gllab[unlist(LC[i])] <- gllab[unlist(LC[i])] + p^i
}


groups <- gclab

colnames(X(resEx)) <- clab

rownames(X(resEx)) <- llab


plot(resEx,dim=c(1,2),label.tol=0.1,col.group = groups,lab.size=0.6)
plot(resEx,dim=c(1,3),label.tol=0.1,col.group = groups,lab.size=0.6)
plot(resEx,dim=c(2,3),label.tol=0.1,col.group = groups,lab.size=0.6)



#------------------
#   SHOW
#------------------


dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]


resEx <- fabia(X,3,0.01,100)

show(resEx)



#------------------
# SHOWSELECTED
#------------------

dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]


resEx <- fabia(X,3,0.01,100)

showSelected(resEx,which=1)
showSelected(resEx,which=2)



#------------------
# SUMMARY
#------------------

dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]


resEx <- fabia(X,3,0.01,100)

summary(resEx)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.