factorize: Maximum likelihood factorization

Description Usage Arguments Details Value Examples

View source: R/factorize.R

Description

Performs single or multiple rank NMF factorization of count matrix using maximum likelihood

Usage

1
2
3
4
factorize(object, ranks = 2, nrun = 20, randomize = FALSE,
  nsmpl = 1, verbose = 2, progress.bar = TRUE, Itmax = 10000,
  ncnn.step = 40, criterion = "likelihood", linkage = "average",
  Tol = 1e-05, store.connectivity = FALSE)

Arguments

object

scNMFSet object containing count matrix.

ranks

Rank for factorization; can be a vector of multiple values.

nrun

No. of runs with different initial guess.

randomize

Boolean; if TRUE, input matrix is randomized.

nsmpl

No. of randomized samples to average over.

verbose

The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent.

progress.bar

Display progress bar when nrun > 1 and verbose = 1.

Itmax

Maximum no. of iteration.

ncnn.step

Minimum no. of steps with no change in connectivity matrix to achieve convergence.

criterion

If 'likelihood', iteration stops when fractional changes in likelihood is below tolerance Tol. If criterion = 'connectivity', iteration stops when connectivity matrix does not change for at least ncnn.step steps.

linkage

Method to be sent to hclust in calculating cophenetic correlation.

Tol

Tolerance for checking convergence with criterion = 'likelihood'.

store.connectivity

Returns a list also containing connectivity data.

Details

The main input is the scNMFSet object with count matrix. This function performs non-negative factorization and fills in the empty slots basis, coeff, and ranks.

When run with multiple values of ranks, factorization is repeated for each rank and the slot measure contains quality measures of the ranks. The quality measure likelihood is negative the KL distance of the fit to the target. With nrun > 1, the likelihood is the maximum among all runs.

The quality measure dispersion is the scalar measure of how far the connectivity matrix is from 0, 1. With increasing nrun, dispersion decreases from 1. nrun should be chosen such that dispersion does not change appreciably. With randomization, count matrix of object is shuffled. nsmpl can be used to average over multiple permutations. This averaging applies to each quality measure under a given rank.

Value

Object of class scNMFSet with factorization slots filled.

Examples

1
2
3
4
5
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(count=x)
s <- factorize(s,ranks=seq(2,8),nrun=5)
plot(s)

ccfindR documentation built on Nov. 8, 2020, 5:12 p.m.