extract_signatures: Extract mutational signatures

Description Usage Arguments Value Examples

View source: R/sigfit_estimation.R

Description

extract_signatures performs MCMC sampling to infer a set of mutational signatures and their exposures from a collection of mutational catalogues. Four models of signatures are available: multinomial, Poisson, normal and negative binomial. The normal model can be used when counts contains continuous (non-integer) values, while the negative binomial model is a more noise-robust version of the Poisson model. (However, the use of the negative binomial model for signature extraction is discouraged due to its inefficiency.)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
extract_signatures(
  counts,
  nsignatures,
  model = "multinomial",
  opportunities = NULL,
  sig_prior = NULL,
  exp_prior = NULL,
  dpp = FALSE,
  dpp_conc = 1,
  stanfunc = "sampling",
  chains = 1,
  ...
)

Arguments

counts

Numeric matrix of observed mutation counts, with one row per sample and one column per mutation type.

nsignatures

Integer or integer vector indicating the number(s) of signatures to extract.

model

Name of the model to sample from. Admits character values "multinomial" (default), "poisson", "negbin", "normal", "nmf" (an alias for "multinomial"), and "emu" (an alias for "poisson").

opportunities

Numeric matrix of optional mutational opportunities, with one row per sample and one column per mutation type. It also admits character values "human-genome" or "human-exome", in which case the mutational opportunities of the reference human genome/exome will be used for every sample.

sig_prior

Numeric matrix with one row per signature and one column per mutation type, to be used as the Dirichlet priors for the mutational signatures. Only used when a single value is provided for nsignatures. Default priors are uniform.

exp_prior

Numeric matrix with one row per sample and one column per signature, to be used as the Dirichlet priors for the signature exposures. Default priors are uniform.

dpp

Logical indicating whether to use a Dirichlet process prior to infer the number of mutational signatures (default is FALSE).

dpp_conc

Numeric indicating the value of the concentration parameter for the Dirichlet process prior (default is 1). Only used if dpp=TRUE.

stanfunc

Character indicating the choice of rstan inference strategy. Admits values "sampling", "optimizing" and "vb". The default value is "sampling", which corresponds to the full Bayesian MCMC approach. Alternatively, "optimizing" returns the Maximum a Posteriori (MAP) point estimates via numerical optimization, while "vb" uses Variational Bayes to approximate the full posterior.

chains

Integer indicating the number of chains used for MCMC (default is 1). The use of multiple chains for signature extraction is discouraged, as it can result in an inference problem called 'label switching'. This value is passed to rstan::sampling.

...

Additional arguments to be passed to the sampling function (by default, rstan::sampling).

Value

A list with two elements:

The model parameters (such as signatures and exposures) can be extracted from this object using retrieve_pars. If a range of numbers of signatures is provided via the nsignatures argument, a list is returned in which the N-th element contains the extraction results for N signatures, as a list with the structure described above.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
# Load example mutational catalogues
data("counts_21breast")

# Extract 2 to 6 signatures using the NMF (multinomial) model
# (400 warmup iterations + 400 sampling iterations - use more in practice)
samples_nmf <- extract_signatures(counts_21breast, nsignatures = 2:6,
                                  model = "nmf", iter = 800)

# Extract 4 signatures using the EMu (Poisson) model
# (400 warmup iterations + 800 sampling iterations - use more in practice)
samples_emu <- extract_signatures(counts_21breast, nsignatures = 4, model = "emu",
                                  opportunities = "human-genome",
                                  iter = 1200, warmup = 400)

## End(Not run)

kgori/sigfit documentation built on Feb. 3, 2022, 12:04 p.m.