Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sparseBC.choosekr.R

Perform cross-validation to select K (number of row clusters) and R (number of column clusters) for sparse biclustering. We assume that lambda is known.

1 | ```
sparseBC.choosekr(x, k, r, lambda, percent = 0.1, trace=FALSE)
``` |

`x` |
Data matrix; samples are rows and columns are features. Cannot contain missing values. |

`k` |
A range of values of K to be considered. Values considered must be an increasing sequence. |

`r` |
A range of values of R to be considered. Values considered must be an increasing sequence. |

`lambda` |
Non-negative regularization parameter for lasso. lambda=0 means no regularization. |

`percent` |
Percentage of elements of x to be left out for cross-validation. 1 must be divisible by the specified percentage. The default value is 0.1. |

`trace` |
Print out progress as iterations are performed. Default is FALSE. |

The function performs cross-validation as described in Algorithm (2) in Tan and Witten (2014) 'Sparse biclustering of transposable data'. Briefly, it works as follows: (1) some percent of the elements of x is removed at random from the data matrix - call those elements missing elements, (2) the missing elements are imputed using the mean of the other elements of the matrix, (3) sparse biclustering is performed with various values of K and R and the mean matrix is estimated, (4) calculate the sum of squared error of the missing values between x and the estimated mean matrix. This procedure is repeated 1/percent times. Finally, we select K and R based on the criterion described in Algorithm (2) in Tan and Witten (2014). A similar procedure is used in Witten et al (2009) 'A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis'.

Note that sparseBC is run with center=TRUE in this function.

`estimated_kr` |
The chosen values of K and R based on cross-validation. |

`results.mean` |
Mean squared error for all values of K and R considered. |

`results.se` |
Standard error of the sum of squared error for all values of K and R considered. |

Kean Ming Tan and Daniela Witten

KM Tan and D Witten (2014) Sparse biclustering of transposable data. *Journal of Computational and Graphical Statistics* 23(4):985-1008.

D Witten, R Tibshirani, and T Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, *Biostatistics* 10(3), 515–534.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
########### Create data matrix with K=2 R=4 row and column clusters
#k <- 2
#r <- 4
#n <- 200
#p <- 200
# mus<-runif(k*r,-3,3)
# mus<-matrix(c(mus),nrow=k,ncol=r,byrow=FALSE)
# truthCs<-sample(1:k,n,rep=TRUE)
# truthDs<-sample(1:r,p,rep=TRUE)
# x<-matrix(rnorm(n*p,mean=0,sd=2),nrow=n,ncol=p)
# for(i in 1:max(truthCs)){
# for(j in 1:max(truthDs)){
# x[truthCs==i, truthDs==j] <- x[truthCs==i, truthDs==j] + mus[i,j]
# }
# }
# x<-x-mean(x)
# Example is commented out for short run-time
############ Perform sparseBC.choosekr to choose the number of row and column clusters
#sparseBC.choosekr(x,1:5,1:5,0,0.2)$estimated_kr
``` |

sparseBC documentation built on May 30, 2017, 3:05 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.