ParallelGibbsSample: Setup hierarchical Dirichlet processes and run parallel Gibbs...

View source: R/ParallelGibbsSample.R

ParallelGibbsSampleR Documentation

Setup hierarchical Dirichlet processes and run parallel Gibbs sampling chains

Description

Setup hierarchical Dirichlet processes and run parallel Gibbs sampling chains

Usage

ParallelGibbsSample(
  input.catalog,
  seedNumber = 1,
  K.guess,
  multi.types = FALSE,
  verbose = FALSE,
  burnin = 5000,
  burnin.multiplier = 2,
  post.n = 200,
  post.space = 100,
  post.cpiter = 3,
  post.verbosity = 0,
  CPU.cores = 20,
  num.child.process = 20,
  gamma.alpha = 1,
  gamma.beta = 20,
  checkpoint = TRUE
)

Arguments

input.catalog

Input spectra catalog as a matrix or in ICAMS format.

seedNumber

A random seed that ensures ensures reproducible results.

K.guess

Suggested initial value of the number of raw clusters. Usually, the number of raw clusters is roughly twice the number of extracted signatures. Passed to hdpx::dp_activate as argument initcc.

multi.types

A logical scalar or a character vector.

If FALSE, The HDP analysis will regard all input spectra as one tumor type, and the HDP structure will have one parent node for all tumors.

If TRUE, Sample IDs in input.catalog must have the form sample_type::sample_id.

If a character vector, then its length must be ncol(input.catalog), and each value is the sample type of the corresponding column in input.catalog, e.g. c(rep("Type-A", 23), rep("Type-B", 10)) for 23 Type-A samples and 10 Type-B samples.

If not FALSE, HDP will have one parent node for each sample type and one grandparent node.

verbose

If TRUE then message progress information.

burnin

The number of burn-in iterations in one batch. The total number of burnin iterations is burnin * burnin.multiplier.

burnin.multiplier

Run burnin.multiplier rounds of burnin iterations. If checkpoint is TRUE, save the burnin chain (see parameter checkpoint.) The diagnostic plot diagnostics.likelihood.pdf can help determine if the chain is stationary. The burnin can be continued from a checkpoint file with ExtendBurnin (see argument checkpoint).

post.n

The number of posterior samples to collect.

post.space

The number of iterations between collected samples.

post.cpiter

The number of iterations of concentration parameter samplings to perform after each iteration.

post.verbosity

Verbosity of debugging statements. No need to change except for development purposes.

CPU.cores

Number of CPUs to use; this should be no more than num.child.process.

num.child.process

Number of posterior sampling chains; can set to 1 for testing. We recommend 20 for real data analysis

gamma.alpha

Shape parameter of the gamma distribution prior for the Dirichlet process concentration parameters α_0 and all α_j in Figure B.1 of

  • https://www.repository.cam.ac.uk/bitstream/handle/1810/275454/Roberts-2018-PhD.pdf

gamma.beta

Inverse scale parameter (rate parameter) of the gamma distribution prior for the Dirichlet process concentration parameters: β_0 and all β_j in Figure B.1 of

  • https://www.repository.cam.ac.uk/bitstream/handle/1810/275454/Roberts-2018-PhD.pdf

We recommend gamma.alpha = 1 and gamma.beta = 20 for single-base-substitution signature extraction; gamma.alpha = 1 and gamma.beta = 50 for doublet-base-substitution and indel signature extraction

checkpoint

If TRUE, then

  • Checkpoint each final Gibbs sample chain to the current working directory, in a file called mSigHdp.sample.checkpoint.x.Rdata, where x depends on seedNumber.

  • Periodically checkpoint the burnin state to the current working directory, in files called mSigHdp.burnin.checkpoint.x.Rdata, where x depends on the seedNumber.

Value

Invisibly, the clean chlist (output of CleanChlist). This is a list of hdpSampleChain-class objects (see package hdpx).


steverozen/mSigHdp documentation built on Feb. 6, 2023, 1:36 a.m.