README.md

Description

Performance enhancements and SciDB matrix support for the s4vd s4vd biclustering method of Lee, Shen, Huange and Marron (Biclustering via Sparse Singular Value Decomposition, M. Lee, H. Shen, J. Huang, J. S. Marron, Biometrics 66, pp. 1087-1095, December 2010).

The package vignette summarizes the modifications: s4vdp4.pdf

NOTE: This package relies on recent versions of the scidb package for R.

Package Installation

Install this package directly from Github using the devtools package:

library("devtools")
install_github(repo="s4vdp4", username="Paradigm4")

Install the latest SciDB package for R with:

install_github(repo="SciDBR", username="Paradigm4", ref="laboratory", quick=TRUE)

Example

library("s4vd")
data(lung)
A = lung[1:2000,]
cat("Starting standard s4vd\n")
t1 = proc.time()
x = biclust(A, method=BCssvd, K=1)
print(proc.time()-t1)


library("s4vdp4")

cat("In-memory P4 s4vd\n")
X = A
t1 = proc.time()
x1 = biclust(X, method=BCssvd, K=1)
print(proc.time()-t1)

cat("Partly in-database P4 s4vd\n")
X = as.scidb(A)
t1 = proc.time()
x2 = biclust(X, method=BCssvd, K=1)
print(proc.time()-t1)

Future work

We've moved the large matrix vector products and matrix factorizations into SciDB. However, two vectors are returned to R for some additional processing in each iteration. This data transfer back and forth in each iteration is a bottle neck. We'd like to move more (all?) of the algorithm into SciDB in a future version, still scripting the overall program from R.



Paradigm4/s4vdp4 documentation built on May 8, 2019, 12:55 a.m.