nnlm.cv: Cross-validation for NMF
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

Description Usage Arguments Details Value

NMF cross-validation for rank determination against the angle between bipartite factorizations

nnlm.cv(
  A,
  byrow = TRUE,
  k = seq(from = 5, to = 20, by = 2),
  max.iter = 1000,
  rel.tol = 0.001,
  n.threads = 0,
  verbose = 1,
  trace = 5,
  seed = 123,
  n.starts = 1,
  alpha = c(0, 0, 0.5),
  beta = c(0, 0, 0.5),
  return.models = FALSE,
  smart.split = TRUE,
  smart.split.block.size = 200,
  reduction = "dclus",
  dist.method = "cosine"
)

`A`	A matrix to be factorized (i.e. result from average.expression) or a Seurat object with cluster centers in a dimensional reduction slot. If sparse, will be coerced to dense format. If the entire data should be used from the Seurat object, specify reduction = NULL.
`byrow`	Bipartition by rows rather than columns (default TRUE)
`k`	Array of integer ranks (default seq(from = 5, to = 20, by = 2))
`max.iter`	Maximum number of alternating NNLS solves (default 1000)
`rel.tol`	Stop criterion for each NNMF run, defined as the relative tolerance between two successive iterations: \|e2-e1\|/avg(e1,e2). (default 1e-3, although 1e-2 may be useful for faster course-grained preliminary analysis of large datasets, small datasets may benefit from a higher tolerance such as 1e-4)
`n.threads`	Number of threads/CPUs to use (default is 0, for all cores)
`verbose`	0 = no tracking, 1 = progress bars for each n.starts, 2 = message for each factorization, 3 = all the details for each factorization
`trace`	An integer specifying a multiple of ANLS NNMF iterations at which MSE error should be calculated and checked for convergence against rel.tol. To check error every iteration, specify 1. To avoid checking error entirely, specify trace > max.iter (default is 5, and is generally an efficient and effective value). For particularly sparse or heterogenous datasets which require hundreds of ANNLS iterations, setting a trace of 10 or 20 may speed up the calculation slightly.
`seed`	Random seed for reproducibility.
`n.starts`	Number of random starts, each run at all given values of k for a unique set of indices (default 1)
`return.models`	Boolean, should W and H matrices be returned for each run (default FALSE). W and H matrices can take up significant memory in large cross-validation experiments.
`smart.split`	Boolean, whether to use smart.split to determine indices if n.starts = 1. Smart split maximizes the signal redundancy between the bipartition of the dataset to achieve optimal cross-validation results. Generally, a single run of smart.split is as informative as multiple runs on random subsets. TRUE by default.
`smart.split.block.size`	Integer, default 200. Smaller is faster, larger achieves better separation of redundant features. Block size gives how many features to run bipartite matching on at a time, the rate limiting component is the bipartite graph solver. When block size is small, the similarity of matched features will be lower. When block size is large, similarity of matched features will be higher and cross-validation result may be better.
`reduction`	If Seurat object is provided, specify a reduction to use feature loadings (i.e. cluster centers), otherwise specify NULL to use the entire counts matrix from the default assay ("dclus" by default).
`dist.method`	"cosine" (default) or "bhjattacharyya" (alternative) for computing distances between clusters and a similarity graph. In exceptionally sparse datasets, bhjattacharyya distance can outperform cosine distance.

nnmf.cv splits the dataset into non-overlapping halves by either row or column and runs NMF on both of these halves at a number of ranks of k. Factors in the NMF model are matched one-to-one by cosine similarity, and the mean angle between both models is calculated as the mean of the angles between matched factors. The rank of k with the minimum angle is the rank at which latent space is most robust.

This cross-validation procedure can be run multiple times on permutations of the dataset, but if only a single run is requested (n.start = 1), a "smart split" is applied (semi non-random) which maximizes signal redundancy between bipartite partitions of the dataset. Generally, a single run with smart.split is sufficient for determination of optimal rank k and captures most of the information that would be learned from multiple starts on entirely random partitions. The scNMF::canyon.plot function is useful for visualizing the results of nnmf.cv to determine optimal rank k or for optimizing the cross-validation procedure. After determining the optimal rank, scNMF::nnmf may be run at the optimal rank.

Subsetting: For large datasets, nnmf.cv may often be run on a subset of the data if signal redundancy is sufficient. However, if there is insufficient signal redundancy, nnmf.cv may not reveal any "canyon" or local minima.

A list with cross-validation info, most easily visualized by running scNMF::canyon.plot on the result. List includes a tall format dataframe of factor angles (factor.angle with columns "k", "factor.angle", "seed"), a tall format dataframe of model angles (model.angle with columns "k", "model.angle", "seed"), if models were requested a list of models and matched factors within a list of starts

zdebruine/scNMF documentation built on Jan. 1, 2021, 1:50 p.m.

zdebruine/scNMF index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

zdebruine/scNMF
Fast Non-negative matrix factorization toolkit for single cell data

nnlm.cv: Cross-validation for NMF
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

Description

Usage

Arguments

Details

Value

Related to nnlm.cv in zdebruine/scNMF...

R Package Documentation

Browse R Packages

We want your feedback!

zdebruine/scNMF Fast Non-negative matrix factorization toolkit for single cell data

nnlm.cv: Cross-validation for NMF In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

Description

Usage

Arguments

Details

Value

Related to nnlm.cv in zdebruine/scNMF...

R Package Documentation

Browse R Packages

We want your feedback!

zdebruine/scNMF
Fast Non-negative matrix factorization toolkit for single cell data

nnlm.cv: Cross-validation for NMF
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data