R/nnmf.R

Defines functions nnmf

Documented in nnmf

#' Non-negative matrix factorization by alternating NNLS
#'
#' @description
#' Fast non-negative matrix factorization by alternating least squares with sequential coordinate descent against mean squared error loss
#'
#' @details
#' Methods are adapted from Lin and Boutros, 2018 BMC Bioinformatics with several improvements for efficiency in unregularized and non-masked use cases. 
#' This function is theoretically equivalent to \code{NNLM::nnmf(..., method = "scd", loss = "mse", alpha = c(0,0,0), beta = c(0,0,0), mask = NULL, init = NULL)}. 
#' \code{scNMF::nnmf} offers >2x faster calculation than \code{NNLM::nnmf} on a single thread for low-rank factorization of a 10k x 200 matrix, and increasingly faster with matrix size, number of threads, and matrix rank.
#' Key differences from \code{NNLM::nnmf} are 1) an improved OpemMP multithreading loop structure, 2) a parallelized MSE error loss function, 3) fewer conditional checks, 4) no MKL error calculations, and 5) no support for regularization, masking, or non-random initialization
#'
#' In addition to bare matrices, Seurat objects may be passed to this function. Automatically, \code{nnmf} will extract the counts matrix from the default assay, but when \code{reduction.use} is specified with a valid dimensional reduction slot, the feature embeddings will be factorized and the resulting NMF model will be projected to all samples.
#' This is convenient, for example, when NMF is run on cluster centers and cells must be visualized on the resulting NMF coordinates.
#'
#' See also \code{scNMF::nnmf.cv} and \code{scNMF::canyon.plot}, dedicated functions for NMF cross-validation. These functions are helpful in determining the optimal rank k for any factorization.
#'
#' For details on the projection algorithm (when Seurat objects are specified and \code{reduction.use} is specified, see \code{scNMF::nnls.project} documentation. A block size of 1000 is used in this instance.
#'
#' @param A A matrix or a Seurat object to be factorized. If a Seurat object, a reduction name may also be supplied, otherwise the entire dataset will be factorized. If sparse, will be converted to dense.
#' @param k Decomposition rank, integer (required)
#' @param rel.tol Stop criterion, defined as the relative tolerance between two successive iterations: |e2-e1|/avg(e1,e2). (default 1e-3)
#' @param verbose boolean, give updates every trace iterations
#' @param n.threads Number of threads/CPUs to use. Default to 0 (all cores).
#' @param max.iter Maximum number of alternating NNLS solutions for H and W, integer (default 1000)
#' @param trace An integer specifying a multiple of iterations at which MSE error should be calculated and checked for convergence. To check error every iteration, specify 1. To avoid checking error at all, specify trace > max.iter (default is 5, and is generally an efficient and effective value)
#' @param reduction.use seurat obj input only:  Specify a reduction to use feature loadings (i.e. cluster centers, such as "dclus"), otherwise leave as NULL to use the entire counts matrix from the default assay (NULL by default). If a reduction is specified, the NMF coordinates will be projected back onto all cells and the results of the projection will be stored in a dimensional reduction object. The original model will be stored in the "misc" slot of that object.
#' @param reduction.key Seurat obj input only: reduction key for the resulting projection. Should end in an underscore. (default "NMF_")
#' @param reduction.name Seurat obj input only: reduction name for the resulting projection (default "nmf")
#' @param inner.rel.tol Tolerance for NNLS SCD
#' @param inner.max.iter Maximum iterations permitted for NNLS SCD
#' @return A list of W and H matrices if a matrix was input, or a Seurat object with result as a dimensional reduction of reduction.key with projections for all cells
#' @seealso \code{scNMF::nnmf.cv}
nnmf <- function(A, k = NULL, max.iter = 1000, rel.tol = 1e-3, n.threads = 0, verbose = TRUE, trace = 5, reduction.use = NULL, reduction.key = "NMF_", reduction.name = "nmf", inner.rel.tol = 1e-6, inner.max.iter = 100) {
  seuratObj <- NULL
  if (class(A)[1] == "Seurat") {
    if (!is.null(reduction.use)) {
      if (reduction.use %in% Reductions(A) == FALSE) stop("Input reduction = ", reduction.use, " but this name is not in Reductions(A) = ", str(Reductions(A)))
      if (verbose > 0) message("\nExtracting feature loadings from reduction = ", reduction.use, "\n")
      seuratObj <- A
      A <- A@reductions[[reduction.use]]@feature.loadings
    } else {
      if (verbose > 0) message("\nExtracting counts matrix from default assay of Seurat object, and converting to dense matrix format")
      seuratObj <- A
      A <- as.matrix(GetAssayData(A))
    }
  }
  if (class(A)[1] == "dgCMatrix") A <- as.matrix(A)
  if (is.null(k)) stop("Specify a positive integer value for k")
  if (rel.tol > 0.1) stop("rel.tol is greater than 0.1. Results will be unstable and unreproducible")
  if (trace < 1) trace <- 1
  if (n.threads < 0) n.threads <- 0

  res <- c_nnmf(
    A,
    as.integer(k),
    as.integer(max.iter),
    as.double(rel.tol),
    as.integer(n.threads),
    as.integer(verbose),
    as.integer(inner.max.iter),
    as.double(inner.rel.tol),
    as.integer(trace)
  )

  rownames(res$W) <- rownames(A)
  colnames(res$H) <- colnames(A)
  colnames(res$W) <- rownames(res$H) <- paste0(reduction.key, 1:k)

  if (!is.null(seuratObj)) {
    if (!is.null(reduction.use)) {
      assay <- DefaultAssay(seuratObj)
      data <- as.matrix(GetAssayData(seuratObj, assay = DefaultAssay(seuratObj)))
      message("\n\nProjecting cells onto NMF factor coordinates\n")
      H <- nnls.project(data, W.model = res$W, n.threads = n.threads, block.size = 1000)
      rownames(H) <- paste0(reduction.key, 1:k)
      # create a new dimensional reduction object
      nmf.dr <- CreateDimReducObject(embeddings = as.matrix(t(H)), loadings = as.matrix(res$W), key = reduction.key, assay = assay, misc = list("H_original" = res$H, "reduction.use" = reduction.use))
    } else {
      nmf.dr <- CreateDimReducObject(embeddings = as.matrix(t(res$H)), loadings = as.matrix(res$W), key = reduction.key, assay = assay)
    }

    # add the dimensional reduction object to Seurat
    seuratObj[[reduction.name]] <- nmf.dr
    return(seuratObj)
  } else {
    return(res)
  }
}
zdebruine/scNMF documentation built on Jan. 1, 2021, 1:50 p.m.