scNMF is a toolkit for:
See the vignettes folder for a fast and gentle introduction to scNMF and a vignette to reproduce figures in the scNMF manuscript.
scNMF introduces a new method for cross-validation based on the robustness of NMF models on a bipartition of the input. Specifically,
Here scNMF::nnmf.cv
is run on several simulated datasets, and then scNMF::canyon.plot
is used to visualize the results. The left dataset
library(scNMF)
library(NMF)
syn <- syntheticNMF(5000, 10, 500, seed = 123, noise = TRUE, ribbon.confidence = 0.99)
cv <- nnmf.cv(syn, byrow = FALSE, k = seq(3,25,1), n.starts = 5, ribbon.confidence = 0.99)
p1 <- canyon.plot(cv)
p2 <- canyon.plot(cv, collapse = TRUE)
wrapper <- function(x, ...) {paste(strwrap(x, ...), collapse = "\n")}
canyonplot <- plot_grid(
ggdraw() + draw_label("NMF cross-validation on a simulated dataset of rank 10\nusing a measure of angular similarity between models", size = 14),
plot_grid(
p1 + NoLegend() + ggtitle("all random starts"),
get_legend(p1),
p2 + ggtitle("average of all random starts"),
ncol = 3,
rel_widths = c(1,0.1,1),
labels = c("A","","B")
) + labs(caption = wrapper("Result of `scNMF::nnmc.cv` plotted by `scNMF::canyon.plot` with (A) `line.collapse = FALSE` or (B) `line.collapse = TRUE`. Ribbon represents a 99% confidence ribbon about the five random starts based on a linear loes fit. The mean model angle is the sum of the angles of the factors, where the factors are matched to achieve the minimum possible overall model angle", width = 160)) +
theme(plot.caption = element_text(hjust = 0, size = 10)),
nrow = 2,
rel_heights = c(0.2,1)
)
ggsave("canyonplot.png", plot = canyonplot, width = 10, height = 5, units = "in", dpi = "retina")
NMF cross-validation on the PBMC3k dataset (stopping criteria: min.dist = 0.01, min.cells = 5)
NMF cross-validation on the PBMC3k dataset (stopping criteria: min.cells = 5)
NMF cross-validation on the bmcite dataset (stopping criteria: min.cells = 10)
Similar results were obtained for the moca7k dataset and the entire MOCAE13.5 dataset.
The "smart split" maximizes signal redundancy between halves of the input, thus theoretically giving the objective function the most statistical power possible for measuring robustness. Note how smart split captured all of the information the other five runs captured collectively, and with far less volatility in the signal. This means "k-fold" cross-validation only needs to be run once with smart split.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.