README.md
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

scNMF: single cell non-negative matrix factorization toolkit

scNMF is a toolkit for:

fast divisive clustering of single cells to compress feature space
fast non-negative matrix factorization of these spaces
easy NMF cross-validation to find optimal factorization rank
projection of cells onto NMF coordinates for visualization
Gene Ontology term enrichment in deep NMF models

See the vignettes folder for a fast and gentle introduction to scNMF and a vignette to reproduce figures in the scNMF manuscript.

scNMF introduces a new method for cross-validation based on the robustness of NMF models on a bipartition of the input. Specifically,

the input matrix is split into halves by either rows or columns,
NMF is run on both halves,
factors in both models are paired based on bipartite matching on a cosine similarity graph,
the mean angular distance of both models is the mean cosine distance of all matched factors.

Here scNMF::nnmf.cv is run on several simulated datasets, and then scNMF::canyon.plot is used to visualize the results. The left dataset

R code to reproduce this figure in 1 minute

library(scNMF)
library(NMF)
syn <- syntheticNMF(5000, 10, 500, seed = 123, noise = TRUE, ribbon.confidence = 0.99)
cv <- nnmf.cv(syn, byrow = FALSE, k = seq(3,25,1), n.starts = 5, ribbon.confidence = 0.99)
p1 <- canyon.plot(cv)
p2 <- canyon.plot(cv, collapse = TRUE)

wrapper <- function(x, ...) {paste(strwrap(x, ...), collapse = "\n")}

canyonplot <- plot_grid(
    ggdraw() + draw_label("NMF cross-validation on a simulated dataset of rank 10\nusing a measure of angular similarity between models", size = 14),
    plot_grid(
        p1 + NoLegend() + ggtitle("all random starts"),
        get_legend(p1),
        p2 + ggtitle("average of all random starts"),
        ncol = 3,
        rel_widths = c(1,0.1,1),
        labels = c("A","","B")
    ) + labs(caption  = wrapper("Result of `scNMF::nnmc.cv` plotted by `scNMF::canyon.plot` with (A) `line.collapse = FALSE` or (B) `line.collapse = TRUE`. Ribbon represents a 99% confidence ribbon about the five random starts based on a linear loes fit. The mean model angle is the sum of the angles of the factors, where the factors are matched to achieve the minimum possible overall model angle", width = 160)) +
        theme(plot.caption = element_text(hjust = 0, size = 10)), 
    nrow = 2,
    rel_heights = c(0.2,1)
)

ggsave("canyonplot.png", plot = canyonplot, width = 10, height = 5, units = "in", dpi = "retina")

NMF cross-validation on a synthetic dataset

NMF cross-validation on the PBMC3k dataset (stopping criteria: min.dist = 0.01, min.cells = 5)

cross-validation

NMF cross-validation on the PBMC3k dataset (stopping criteria: min.cells = 5)

cross-validation

NMF cross-validation on the bmcite dataset (stopping criteria: min.cells = 10)

cross-validation

Similar results were obtained for the moca7k dataset and the entire MOCAE13.5 dataset.

The "smart split" maximizes signal redundancy between halves of the input, thus theoretically giving the objective function the most statistical power possible for measuring robustness. Note how smart split captured all of the information the other five runs captured collectively, and with far less volatility in the signal. This means "k-fold" cross-validation only needs to be run once with smart split.

zdebruine/scNMF documentation built on Jan. 1, 2021, 1:50 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

zdebruine/scNMF
Fast Non-negative matrix factorization toolkit for single cell data

README.md
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

scNMF: single cell non-negative matrix factorization toolkit

NMF cross-validation for optimal rank determination

NMF on compressed transcriptional spaces is not robust

R Package Documentation

Browse R Packages

We want your feedback!

zdebruine/scNMF Fast Non-negative matrix factorization toolkit for single cell data

README.md In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data

scNMF: single cell non-negative matrix factorization toolkit

NMF cross-validation for optimal rank determination

NMF on compressed transcriptional spaces is not robust

R Package Documentation

Browse R Packages

We want your feedback!

zdebruine/scNMF
Fast Non-negative matrix factorization toolkit for single cell data

README.md
In zdebruine/scNMF: Fast Non-negative matrix factorization toolkit for single cell data