vb_factorize: Bayesian NMF inference of count matrix

Description Usage Arguments Details Value Examples

View source: R/bayesian.R

Description

Perform variational Bayes NMF and store factor matrices in object

Usage

1
2
3
4
5
6
vb_factorize(object, ranks = 2, nrun = 1, verbose = 2,
  progress.bar = TRUE, initializer = "random", Itmax = 10000,
  hyper.update = rep(TRUE, 4), gamma.a = 1, gamma.b = 1,
  Tol = 1e-05, hyper.update.n0 = 10, hyper.update.dn = 1,
  connectivity = TRUE, fudge = NULL, ncores = 1, useC = TRUE,
  unif.stop = TRUE)

Arguments

object

scNMFSet object containing count matrix.

ranks

Rank for factorization; can be a vector of multiple values.

nrun

No. of runs with different initial guesses.

verbose

The verbosity level: 3, each iteration output printed; 2, each run output printed; 1, each randomized sample output printed; 0, silent.

progress.bar

Display progress bar with verbose = 1 for multiple runs.

initializer

If 'random', randomized initial conditions; 'svd2' for singular value decomposed initial condition.

Itmax

Maximum no. of iteration.

hyper.update

Vector of four logicals, each indcating whether hyperparameters c(aw, bw, ah, bh) should be optimized.

gamma.a

Gamma distribution shape parameter.

gamma.b

Gamma distribution mean. These two parameters are used for fixed hyperparameters with hyper.update elements FALSE.

Tol

Tolerance for terminating iteration.

hyper.update.n0

Initial number of steps in which hyperparameters are fixed.

hyper.update.dn

Step intervals for hyperparameter updates.

connectivity

If TRUE, connectivity and dispersion will be calculated after each run. Can be turned off to save memory.

fudge

Small positive number used as lower bound for factor matrix elements to avoid singularity. If fudge = NULL (default), it will be replaced by .Machine$double.eps. Can be set to 0 to skip regularization.

ncores

Number of processors (cores) to run. If ncores > 1, parallelization is attempted.

useC

Use C++ version of updates for speed.

unif.stop

Terminate if any of columns in basis matrix is uniform.

Details

The main input is the scNMFSet object with count matrix. This function performs non-negative factorization using Bayesian algorithm and gamma priors. Slots basis, coeff, and ranks are filled.

When run with multiple values of ranks, factorization is repeated for each rank and the slot measure contains log evidence and optimal hyperparameters for each rank. With nrun > 1, the solution with the maximum log evidence is stored for a given rank.

Value

Object of class scNMFSet with factorization slots filled.

Examples

1
2
3
4
5
set.seed(1)
x <- simulate_whx(nrow=50,ncol=100,rank=5)
s <- scNMFSet(x$x)
s <- vb_factorize(s,ranks=seq(2,8),nrun=5)
plot(s)

Example output

Run 1
Rank = 2: Nsteps =31, log(evidence) =-3.220272, hyper = (0.3125798,2.421532,0.2553286,1.090867), dispersion = 1
Rank = 3: Nsteps =40, log(evidence) =-2.301975, hyper = (0.1956184,1.692225,0.1743059,1.03016), dispersion = 1
Rank = 4: Nsteps =90, log(evidence) =-1.915525, hyper = (0.133439,1.341696,0.1292562,0.9847309), dispersion = 1
Rank = 5: Nsteps =101, log(evidence) =-1.910936, hyper = (0.1036276,0.9849012,0.1268327,1.034932), dispersion = 1
Rank = 6: Nsteps =105, log(evidence) =-1.298224, hyper = (0.1065642,0.9108574,0.09708988,0.9640453), dispersion = 1
Rank = 7: Nsteps =75, log(evidence) =-1.337305, hyper = (0.0883038,0.7927825,0.09726714,0.9526487), dispersion = 1
Rank = 8: Nsteps =187, log(evidence) =-1.319013, hyper = (0.08472345,0.7047214,0.07931773,0.8275366), dispersion = 1
Run 2
Rank = 2: Nsteps =43, log(evidence) =-2.985025, hyper = (0.2881739,2.510032,0.2511396,1.066041), dispersion = 0.5093711
Rank = 3: Nsteps =52, log(evidence) =-2.443278, hyper = (0.1834726,1.663126,0.2023086,1.063268), dispersion = 0.6676385
Rank = 4: Nsteps =51, log(evidence) =-1.82324, hyper = (0.1378883,1.358771,0.1377949,0.9858606), dispersion = 0.7315702
Rank = 5: Nsteps =44, log(evidence) =-1.272718, hyper = (0.1279342,1.102526,0.102211,0.9632341), dispersion = 0.8052895
Rank = 6: Nsteps =71, log(evidence) =-1.290593, hyper = (0.1137154,0.8591203,0.08183998,1.025736), dispersion = 0.8359017
Rank = 7: Nsteps =157, log(evidence) =-1.308199, hyper = (0.09161253,0.760215,0.07688123,0.9508089), dispersion = 0.846314
Rank = 8: Nsteps =127, log(evidence) =-1.348637, hyper = (0.07479673,0.6652848,0.0931084,0.9389081), dispersion = 0.8540192
Run 3
Rank = 2: Nsteps =42, log(evidence) =-2.985772, hyper = (0.294905,2.547975,0.2522155,1.058072), dispersion = 0.5638854
Rank = 3: Nsteps =58, log(evidence) =-2.363379, hyper = (0.192261,1.851054,0.1755101,0.9589808), dispersion = 0.6923504
Rank = 4: Nsteps =54, log(evidence) =-1.766811, hyper = (0.1404713,1.448887,0.123031,0.9150137), dispersion = 0.7837938
Rank = 5: Nsteps =59, log(evidence) =-1.25892, hyper = (0.1129743,1.027335,0.1021369,1.039533), dispersion = 0.8278495
Rank = 6: Nsteps =67, log(evidence) =-1.2778, hyper = (0.1051466,0.9334776,0.08698342,0.918255), dispersion = 0.8411773
Rank = 7: Nsteps =90, log(evidence) =-1.314135, hyper = (0.08338692,0.7330196,0.09298199,0.9727505), dispersion = 0.8620945
Rank = 8: Nsteps =89, log(evidence) =-1.340424, hyper = (0.06542349,0.6762856,0.09491738,0.9386752), dispersion = 0.8776436
Run 4
Rank = 2: Nsteps =39, log(evidence) =-2.985024, hyper = (0.2885209,2.558508,0.253911,1.042096), dispersion = 0.6320283
Rank = 3: Nsteps =49, log(evidence) =-2.71742, hyper = (0.1948697,1.755059,0.1770911,1.009096), dispersion = 0.7486985
Rank = 4: Nsteps =47, log(evidence) =-1.815812, hyper = (0.1398367,1.316441,0.1391811,1.004942), dispersion = 0.7913369
Rank = 5: Nsteps =102, log(evidence) =-1.645365, hyper = (0.1039625,1.107128,0.09529263,0.9356816), dispersion = 0.8360058
Rank = 6: Nsteps =83, log(evidence) =-1.285885, hyper = (0.09426568,0.8839006,0.0968866,0.9795459), dispersion = 0.869117
Rank = 7: Nsteps =206, log(evidence) =-1.279008, hyper = (0.08081835,0.8035273,0.09014077,0.8104198), dispersion = 0.8875469
Rank = 8: Nsteps =130, log(evidence) =-1.300439, hyper = (0.06378508,0.6629304,0.09130791,0.8810435), dispersion = 0.8916077
Run 5
Rank = 2: Nsteps =39, log(evidence) =-3.220411, hyper = (0.3099831,2.656032,0.2550805,1.009785), dispersion = 0.68
Rank = 3: Nsteps =50, log(evidence) =-2.559947, hyper = (0.1693791,1.572724,0.1597977,1.120149), dispersion = 0.7872886
Rank = 4: Nsteps =59, log(evidence) =-1.771486, hyper = (0.1394539,1.289718,0.1234046,1.045998), dispersion = 0.8453978
Rank = 5: Nsteps =55, log(evidence) =-1.248135, hyper = (0.1156881,1.069702,0.09765931,0.9793849), dispersion = 0.875252
Rank = 6: Nsteps =156, log(evidence) =-1.267798, hyper = (0.1041198,0.8776958,0.08466639,0.9334439), dispersion = 0.8932445
Rank = 7: Nsteps =106, log(evidence) =-1.300903, hyper = (0.08248962,0.7908929,0.08963676,0.9036839), dispersion = 0.90404
Rank = 8: Nsteps =183, log(evidence) =-1.305394, hyper = (0.07657741,0.6521651,0.07984908,0.8457556), dispersion = 0.911237

ccfindR documentation built on Nov. 8, 2020, 5:12 p.m.