low_rank: Implementation of HSSVD framework for biclusering with...

Description Usage Arguments Details Value Author(s) References Examples

Description

HSSVD is a recently developed data mining tool for discovering subgroups of patients and genes which simultaneously display unusual levels of variability compared to other genes and patients. Previous biclustering methods were restricted to mean level detection, while the new method can detect both mean and variance biclusters.

Usage

1
2
low_rank(x, ranks = c(rep(NA,3)), sparse = rep(FALSE, 3), 
est = rep("wold", 3), add=c(1,0,0),...)

Arguments

x

Data matrix to bicluster, if data range is [0; 1] then logit transform is recommended. If [0;inf] then log transform is recommended, otherwise no transform or standardization is needed.

ranks

Integer vector (dim=3) of rank input for: rescale, mean approximation and variance approximation steps. Default is c(NA,NA,NA), which means that Wold or Gabriel-style cross validation will be used for rank estimation.

sparse

Vector of logical indicators (dim=3). TRUE=sparse input of left and right singular vector is needed. Default is c(FALSE,FALSE,FALSE), see Details for insight.

est

Character vector (dim=3) giving estimation methods for each step. Only "GB" or "Wold" method are available. Default is "Wold"

add

Integer vector (dim=3), where 0 <= add <= 2, which is added to the rank estimate at each step for dense matrices.

...

Additional input variables for FIT.SSVD. See Details section for more information.

Details

FIT-SSVD method (3) assumes that over half of the rows are approximate null rows with little or no signal and similar for the columns, their preselecting row and column steps for initial values and rank estimation will under-select informative rows/columns if all rows/columns are informative or the dimension of rows/columns are not large enough, both of which can happen in biological research. So in practice, the initial value would be the vanilla singular value decomposition's result. For the rank approximation, the Gabriel-style "block" holdout tend to give a larger rank estimation than the Wold-style "speckled" holdout. For our experience Wold method may be better for the large data set. The rank estimation steps are slow, especially when using Wold-style method, so we suggest to run in batch mode or run on clustering if rank estimation is needed.

The FIT.SSVD has several input variables that can be adjusted by the user if desired. However, these are currently limited to
dothres : 'hard' or 'soft' thresholding; default='hard'
n.step : maximum number of iterations in Algorithm 1; default=100
n.err : number of Bootstrap samples in Algorithm 3; default=100

Value

call

record of call to low_rank

rescale

Sparse SVD approximation result of rescale, typically this type object contain singular value (d), left (u) and right (v) singular vectors. (Step 2)

result_mean

Sparse SVD approximation result of Y . (Step 3)

result_var

Sparse SVD approximation result of Z. (Step 4)

ranks

vector of estimated ranks for (U, Y_tilde, Z_tilde).

bgmean

Mean for null cluster. (Step 5)

bgstd

Standard deviation for null cluster. (Step 5)

back

Logical matrix where TRUE=in null cluster.

mean_app

Mean approximation of X. (Step 6)

std_app

Standard deviation approximation of X. (Step 6)

Author(s)

Authors: Guanhua Chen and Michael R. Kosorok
Contributor: Shannon T. Holloway sthollow@ncsu.edu
Maintainer: Guanhua Chen <guanhuac@live.unc.edu>

References

Chen G, Sullivan PF, Kosorok MR (2013) Biclustering with heterogeneous variance. Proceedings of the National Academy of Sciences, doi:10.1073/pnas.1304376110.

Owen AB, Perry PO (2009) Bi-cross-validation of the SVD and the nonnegative matrix factorization. The Annals of Applied Statistics 3:564-594.

Yang, D., Ma, Z., and Buja, A. (2011) A sparse SVD method for high-dimensional data arXiv:1112.2433.

Examples

1
2
3
4
5
6
7
8
9
data(Methylation)
beta_name <- colnames(Methylation)[grep("AVG_Beta",colnames(Methylation))]
ds <- as.matrix(Methylation[beta_name],ncol=length(beta_name))
info <- t(ds)
## The methylation data takes value between [0,1]; 
## therefore, we do a logit tranformation ##
info <- log(info/(1-info))

result <- low_rank(info,ranks=c(8,4,4))

HSSVD documentation built on May 2, 2019, 4:24 a.m.