earlyReduction: do dimension reduction (via NMF or SVD) before Bayesian...
In ttriche/bayesCC: Bayesian consensus clustering

View source: R/earlyReduction.R

earlyReduction

R Documentation

do dimension reduction (via NMF or SVD) before Bayesian consensus clustering

Description

if NMF, the rank can be estimated by 5xCV on NAs, though this can be slow. the underlying rationale is that whatever rank K best recovers artificially missing data (knocked out column-wise, row-wise, or randomly across both) is the best estimable rank we are likely to recover. In order to stabilize the estimate of K, we can run 5x cross-validation and rotate the NAs (set at a default of 20

Usage

earlyReduction(
  mat,
  how = c("NMF", "SVD"),
  mat2 = NULL,
  joint = FALSE,
  findK = FALSE,
  howNA = c("both", "column", "row"),
  viaCV = FALSE,
  pctNA = 0.2
)

Arguments

`mat`	a matrix to decompose (columns are samples, rows are features)
`how`	one of "NMF" or "SVD"; SVD is likely to be much faster
`mat2`	a 2nd matrix to reduce (optional; for joint factorization)
`joint`	if using NMF, should joint factorization be attempted?
`findK`	if using marginal NMF, should the optimal rank(s) be sought?
`howNA`	for rank finding, add NAs column-wise, row-wise, or both?
`viaCV`	for rank finding, should five-fold CV be used when imputing?
`fracNA`	for rank finding, what fraction of the data should be NA'ed?

Details

joint NMF can also be requested (as in Wang et al., Bioinformatics 2015, doi: 10.1093/bioinformatics/btu679) but in this case the ranks can only be estimated marginally. Joint rank estimation (and, by extension, optimal joint imputation for linked views) is an open research topic as best as we can tell. if anyone wants to send a patch we will gladly apply it and a great many people will probably start using it thereafter.

if SVD, the rank will be whatever the data supports (i.e. min(nrow, ncol)).