fit_extract_signatures: Fit-and-extract mutational signatures

Description Usage Arguments Value Examples

View source: R/sigfit_estimation.R

Description

fit_extract_signatures performs MCMC sampling to simultaneously fit a set of 'fixed' signatures to a collection of mutational catalogues (as in fit_signatures) and extract a number of 'additional' signatures from the catalogues (as in extract_signatures). Four models of signatures are available: multinomial, Poisson, normal and negative binomial. The normal model can be used when counts contains continuous (non-integer) values, while the negative binomial model is a more noise-robust version of the Poisson model. (However, the use of the negative binomial model for signature extraction is discouraged due to its inefficiency.)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fit_extract_signatures(
  counts,
  signatures,
  num_extra_sigs,
  model = "multinomial",
  opportunities = NULL,
  sig_prior = NULL,
  exp_prior = NULL,
  dpp = FALSE,
  dpp_conc = 1,
  stanfunc = "sampling",
  chains = 1,
  ...
)

Arguments

counts

Numeric matrix of observed mutation counts, with one row per sample and one column per mutation type.

signatures

'Fixed' mutational signatures to be fitted; either a numeric matrix with one row per signature and one column per mutation type, or a list of matrices generated via retrieve_pars.

num_extra_sigs

Numeric indicating the number of 'additional' signatures to be extracted.

model

Name of the model to sample from. Admits character values "multinomial" (default), "poisson", "negbin", "normal", "nmf" (an alias for "multinomial"), and "emu" (an alias for "poisson").

opportunities

Numeric matrix of optional mutational opportunities, with one row per sample and one column per mutation type. It also admits character values "human-genome" or "human-exome", in which case the mutational opportunities of the reference human genome/exome will be used for every sample.

sig_prior

Numeric matrix with one row per 'additional' signature and one column per mutation type, to be used as the Dirichlet priors for the additional signatures to be extracted. Default priors are uniform.

exp_prior

Numeric matrix with one row per sample and one column per signature (including both 'fixed' and 'additional' signatures), to be used as the Dirichlet priors for the signature exposures. Default priors are uniform.

dpp

Logical indicating whether to use a Dirichlet process prior to infer the number of mutational signatures (default is FALSE).

dpp_conc

Numeric indicating the value of the concentration parameter for the Dirichlet process prior (default is 1). Only used if dpp=TRUE.

stanfunc

Character indicating the choice of rstan inference strategy. Admits values "sampling", "optimizing" and "vb". The default value is "sampling", which corresponds to the full Bayesian MCMC approach. Alternatively, "optimizing" returns the Maximum a Posteriori (MAP) point estimates via numerical optimization, while "vb" uses Variational Bayes to approximate the full posterior.

chains

Integer indicating the number of chains used for MCMC (default is 1). The use of multiple chains for signature extraction is discouraged, as it can result in an inference problem called 'label switching'. This value is passed to rstan::sampling.

...

Additional arguments to be passed to the sampling function (by default, rstan::sampling).

Value

A list with two elements:

The model parameters (such as signatures and exposures) can be extracted from this object using retrieve_pars.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
# Simulate two catalogues using signatures 1, 4, 5, 7, with
# proportions 4:2:3:1 and 2:3:4:1, respectively
data("cosmic_signatures_v2")
probs <- rbind(c(0.4, 0.2, 0.3, 0.1) %*% cosmic_signatures_v2[c(1, 4, 5, 7), ],
               c(0.2, 0.3, 0.4, 0.1) %*% cosmic_signatures_v2[c(1, 4, 5, 7), ])
mutations <- rbind(t(rmultinom(1, 20000, probs[1, ])),
                   t(rmultinom(1, 20000, probs[2, ])))

# Assuming that we do not know signature 7 a priori, but we know the others
# to be present, extract 1 signature while fitting signatures 1, 4 and 5.
# (400 warmup iterations + 400 sampling iterations - use more in practice)
mcmc_samples <- fit_extract_signatures(mutations, cosmic_signatures_v2[c(1, 4, 5), ],
                                       num_extra_sigs = 1, model = "nmf", iter = 800)

# Plot original and extracted signature 7
extr_sigs <- retrieve_pars(mcmc_samples, "signatures")
plot_spectrum(cosmic_signatures_v2[7, ], pdf_path = "COSMIC_Sig7.pdf", name="COSMIC sig. 7")
plot_spectrum(extr_sigs, pdf_path = "Extracted_Sigs.pdf")

## End(Not run)

kgori/sigfit documentation built on Feb. 3, 2022, 12:04 p.m.