sparseBC: Sparse biclustering
In sparseBC: Sparse Biclustering of Transposable Data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sparseBC.R

This function performs sparse biclustering on an n by p matrix. Details are given in Tan and Witten (2014).

1 2	sparseBC(x, k, r, lambda, nstart = 20, Cs.init = NULL, Ds.init = NULL, max.iter = 1000,threshold=1e-10,center=TRUE)

`x`	Data matrix; samples are rows and columns are features. Cannot contain missing values.
`k`	The number of row clusters, i.e., the number of clusters for the observations.
`r`	The number of column clusters, i.e., the number of clusters for the features.
`lambda`	Non-negative regularization parameter for lasso on the mean of each bicluster. lambda=0 means no regularization.
`nstart`	The number of random initialization sets used in the kmeans function. The default is 20.
`Cs.init`	Starting values for the row labels. The default value is NULL – kmeans clustering is performed to estimate the row labels.
`Ds.init`	Starting values for the column labels. The default value is NULL – kmeans clustering is performed to estimate the column labels.
`max.iter`	Maximum number of iterations. The default value is 1000 iterations.
`threshold`	Threshold value for convergence. The default is 1e-10.
`center`	Mean center the data matrix before performing sparse biclustering. The default is TRUE.

This implements sparse biclustering using Algorithm (1) described in Tan and Witten (2014) 'Sparse biclustering of transposable data', which estimates the row labels for the observations and column labels for the features. The mean of each bicluster is encouraged to be sparse using the lasso penalty. Details are given in Algorithm (1) in Tan and Witten (2014).

If center=TRUE, the data matrix x is mean centered before performing sparse biclustering. The reported mean matrix mus is the addition of the substracted mean, mean(x), and the estimated mean matrix from sparse biclustering on the mean centered data.

Note that center=TRUE will not give any estimated mean to be zero, unless the data is initially centered to have mean(x)=0. Instead, when center=TRUE, elements of the mean matrix are shrunken towards mean(x).

an object of class sparseBC.

Among some internal variables, this object includes the elements

`Cs`	Cs is the output for the row labels.
`Ds`	Ds is the output for the column labels.
`objs`	objs is the minimized objective value of the l1 penalized log-likelihood.
`mus`	mus is the estimated mean matrix for the entire matrix.
`Mus`	Mus is the estimated mean matrix for each bicluster.
`iteration`	The number of iterations until convergence.

Kean Ming Tan and Daniela Witten

KM Tan and D Witten (2014) Sparse biclustering of transposable data. Journal of Computational and Graphical Statistics 23(4):985-1008.

sparseBC.BIC sparseBC.choosekr summary.sparseBC image.sparseBC

##############################################
# Example from Figure 1 in the manuscript
# A toy example to illustrate the results from k-means and sparse biclustering
##############################################

# Generate the data matrix x
set.seed(1)
n<-100
p<-200
k<-5
r<-5
truthCs<-rep(1:k, each=(n/k))
truthDs<-rep(1:r, each=(p/r))
mus<-runif(k*r,-3,3)
mus<-matrix(c(mus),nrow=k,ncol=r,byrow=FALSE)
x<-matrix(rnorm(n*p,mean=0,sd=5),nrow=n,ncol=p)

# Generate the mean matrix 
musmatrix<-matrix(NA,nrow=n,ncol=p)
for(i in 1:max(truthCs)){
  for(j in 1:max(truthDs)){ 
  x[truthCs==i,truthDs==j]<-x[truthCs==i,truthDs==j]+mus[i,j]
  musmatrix[truthCs==i,truthDs==j]<-mus[i,j]
  } 
}	

# Perform kmeans on the row and columns and calculate its mean
km.Cs<-kmeans(x,k,nstart=20)$cluster
km.Ds<-kmeans(t(x),r,nstart=20)$cluster
km.mus<-matrix(NA,nrow=n,ncol=p)
for(i in 1:n){
  for(j in 1:p){
  km.mus[i,j]<-mean(x[km.Cs==km.Cs[i],km.Ds==km.Ds[j]])		
  }
}

# Perform sparse biclustering with 5 row clusters and 5 column clusters and lambda=0
bicluster<-sparseBC(x,5,5,0)


# Display some information on the object sparseBC
summary(bicluster)


# Image plots to illustrate the estimated mean matrix
par(mfrow=c(2,2))
image(t(x),main="x")
image(t(musmatrix),main="Mean Matrix")
image(t(km.mus),main="Kmeans")
image(t(bicluster$mus),main="sparseBC")

# Built-in image plot for object sparseBC
image(bicluster)

Loading required package: glasso
Summary for the object "sparseBC"
Call:
	sparseBC(x = x, k = 5, r = 5, lambda = 0)

Cluster labels for the rows:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [38] 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [75] 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Cluster labels for the columns:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[112] 3 3 3 3 3 3 3 3 3 2 2 4 2 2 2 2 2 2 3 2 2 3 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2
[149] 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[186] 5 5 5 5 5 5 5 5 5 5 4 5 5 5 5

The estimated bicluster means:
           [,1]         [,2]       [,3]       [,4]       [,5]
[1,] -1.5302545  0.001627367 -1.7449581  2.2860496  2.5431882
[2,] -0.7248690  1.592563205 -1.7817545  2.6979107 -1.5859438
[3,]  2.4123244 -0.689175157 -0.8520232  0.7717082 -2.3210452
[4,]  0.6574947  3.364659982  1.0812713  0.9030033  0.5547726
[5,] -1.7889230  1.504187455  1.7509216 -2.2402360 -1.4115348