RunHdpxParallel: Extract (discover) mutational signatures from a matrix of...

View source: R/RunHdpxParallel.R

RunHdpxParallelR Documentation

Extract (discover) mutational signatures from a matrix of mutational spectra

Description

Please see the vignette for an example.

Usage

RunHdpxParallel(
  input.catalog,
  seedNumber = 123,
  K.guess,
  multi.types = FALSE,
  verbose = FALSE,
  burnin = 1000,
  burnin.multiplier = 10,
  post.n = 200,
  post.space = 100,
  post.cpiter = 3,
  post.verbosity = 0,
  CPU.cores = 20,
  num.child.process = 20,
  high.confidence.prop = 0.9,
  hc.cutoff = NULL,
  merge.raw.cluster.args = hdpx::default_merge_raw_cluster_args(),
  overwrite = TRUE,
  out.dir = paste0("./RunHdpxParallel_out_", as.numeric(Sys.time())),
  gamma.alpha = 1,
  gamma.beta = 20,
  checkpoint = TRUE,
  downsample_threshold = NULL
)

Arguments

input.catalog

Input spectra catalog as a matrix or in ICAMS format.

seedNumber

A random seed that ensures ensures reproducible results.

K.guess

Suggested initial value of the number of raw clusters. Usually, the number of raw clusters is roughly twice the number of extracted signatures. Passed to hdpx::dp_activate as argument initcc.

multi.types

A logical scalar or a character vector.

If FALSE, The HDP analysis will regard all input spectra as one tumor type, and the HDP structure will have one parent node for all tumors.

If TRUE, Sample IDs in input.catalog must have the form sample_type::sample_id.

If a character vector, then its length must be ncol(input.catalog), and each value is the sample type of the corresponding column in input.catalog, e.g. c(rep("Type-A", 23), rep("Type-B", 10)) for 23 Type-A samples and 10 Type-B samples.

If not FALSE, HDP will have one parent node for each sample type and one grandparent node.

verbose

If TRUE then message progress information.

burnin

The number of burn-in iterations in one batch. The total number of burnin iterations is burnin * burnin.multiplier.

burnin.multiplier

Run burnin.multiplier rounds of burnin iterations. If checkpoint is TRUE, save the burnin chain (see parameter checkpoint.) The diagnostic plot diagnostics.likelihood.pdf can help determine if the chain is stationary. The burnin can be continued from a checkpoint file with ExtendBurnin (see argument checkpoint).

post.n

The number of posterior samples to collect.

post.space

The number of iterations between collected samples.

post.cpiter

The number of iterations of concentration parameter samplings to perform after each iteration.

post.verbosity

Verbosity of debugging statements. No need to change except for development purposes.

CPU.cores

Number of CPUs to use; this should be no more than num.child.process.

num.child.process

Number of posterior sampling chains; can set to 1 for testing. We recommend 20 for real data analysis

high.confidence.prop

Raw clusters of mutations found in >= high.confidence.prop proportion of posterior samples are signatures with high confidence.

hc.cutoff

Deprecated, use merge.raw.cluster.args.

merge.raw.cluster.args

See default_merge_raw_cluster_args in package hdpx.

overwrite

If TRUE overwrite out.dir if it exists, otherwise raise an error.

out.dir

If not NULL then a character string specifying a directory that will be created for the output, including csv files and plots (pdfs) of extracted signatures and their exposures. If NULL no directory will be created and no files will be generated.

gamma.alpha

Shape parameter of the gamma distribution prior for the Dirichlet process concentration parameters α_0 and all α_j in Figure B.1 of

  • https://www.repository.cam.ac.uk/bitstream/handle/1810/275454/Roberts-2018-PhD.pdf

gamma.beta

Inverse scale parameter (rate parameter) of the gamma distribution prior for the Dirichlet process concentration parameters: β_0 and all β_j in Figure B.1 of

  • https://www.repository.cam.ac.uk/bitstream/handle/1810/275454/Roberts-2018-PhD.pdf

We recommend gamma.alpha = 1 and gamma.beta = 20 for single-base-substitution signature extraction; gamma.alpha = 1 and gamma.beta = 50 for doublet-base-substitution and indel signature extraction

checkpoint

If TRUE, then

  • Checkpoint each final Gibbs sample chain to the current working directory, in a file called mSigHdp.sample.checkpoint.x.Rdata, where x depends on seedNumber.

  • Periodically checkpoint the burnin state to the current working directory, in files called mSigHdp.burnin.checkpoint.x.Rdata, where x depends on the seedNumber.

downsample_threshold

See downsample_spectra and link{show_downsample_curves}.

Details

Please see our paper at https://www.biorxiv.org/content/10.1101/2022.01.31.478587v1 for suggestions on argument values to use.

Value

Invisibly, a list with the following elements:

signature

The extracted signature profiles as a matrix; rows are mutation types, columns are signatures with high confidence.

signature.post.samp.number

A data frame with two columns. The first column corresponds to each signature in signature and the second columns contains the number of posterior samples that found the raw clusters contributing to the signature.

signature.cdc

A numeric data frame. Columns correspond to signatures as in signature. Rows correspond to either biological samples or to parent and grandparent Dirichlet processes.

exposureProbs

The inferred exposures as a matrix of mutation probabilities; rows are signatures, columns are samples (e.g. tumors). This is similar to signature.cdc, but every column was normalized to sum to 1.

low.confidence.signature

The profiles of signatures extracted with low confidence as a matrix; rows are mutation types, columns are signatures with < high.confidence.prop of posterior samples.

low.confidence.post.samp.number

Analogous to signature.post.samp.number, except that this one is for signatures in low.confidence.signature.

low.confidence.cdc

Analogous to signature.cdc, except that columns in this matrix correspond to columns in low.confidence.signature.

extracted.retval

A list object returned from extract_components in package hdpx.


steverozen/mSigHdp documentation built on Feb. 6, 2023, 1:36 a.m.