extract_signatures: Extract mutational signatures
In kgori/sigfit: Flexible Bayesian inference of mutational signatures

Description Usage Arguments Value Examples

extract_signatures performs MCMC sampling to infer a set of mutational signatures and their exposures from a collection of mutational catalogues. Four models of signatures are available: multinomial, Poisson, normal and negative binomial. The normal model can be used when counts contains continuous (non-integer) values, while the negative binomial model is a more noise-robust version of the Poisson model. (However, the use of the negative binomial model for signature extraction is discouraged due to its inefficiency.)

extract_signatures(
  counts,
  nsignatures,
  model = "multinomial",
  opportunities = NULL,
  sig_prior = NULL,
  exp_prior = NULL,
  dpp = FALSE,
  dpp_conc = 1,
  stanfunc = "sampling",
  chains = 1,
  ...
)

`counts`	Numeric matrix of observed mutation counts, with one row per sample and one column per mutation type.
`nsignatures`	Integer or integer vector indicating the number(s) of signatures to extract.
`model`	Name of the model to sample from. Admits character values `"multinomial"` (default), `"poisson"`, `"negbin"`, `"normal"`, `"nmf"` (an alias for `"multinomial"`), and `"emu"` (an alias for `"poisson"`).
`opportunities`	Numeric matrix of optional mutational opportunities, with one row per sample and one column per mutation type. It also admits character values `"human-genome"` or `"human-exome"`, in which case the mutational opportunities of the reference human genome/exome will be used for every sample.
`sig_prior`	Numeric matrix with one row per signature and one column per mutation type, to be used as the Dirichlet priors for the mutational signatures. Only used when a single value is provided for `nsignatures`. Default priors are uniform.
`exp_prior`	Numeric matrix with one row per sample and one column per signature, to be used as the Dirichlet priors for the signature exposures. Default priors are uniform.
`dpp`	Logical indicating whether to use a Dirichlet process prior to infer the number of mutational signatures (default is `FALSE`).
`dpp_conc`	Numeric indicating the value of the concentration parameter for the Dirichlet process prior (default is 1). Only used if `dpp=TRUE`.
`stanfunc`	Character indicating the choice of rstan inference strategy. Admits values `"sampling"`, `"optimizing"` and `"vb"`. The default value is `"sampling"`, which corresponds to the full Bayesian MCMC approach. Alternatively, `"optimizing"` returns the Maximum a Posteriori (MAP) point estimates via numerical optimization, while `"vb"` uses Variational Bayes to approximate the full posterior.
`chains`	Integer indicating the number of chains used for MCMC (default is 1). The use of multiple chains for signature extraction is discouraged, as it can result in an inference problem called 'label switching'. This value is passed to `rstan::sampling`.
`...`	Additional arguments to be passed to the sampling function (by default, `rstan::sampling`).

A list with two elements:

`data`: list containing the input data supplied to the model.
`result`: object of class stanfit, containing the output MCMC samples, as well as information about the model and the sampling process.

The model parameters (such as signatures and exposures) can be extracted from this object using retrieve_pars. If a range of numbers of signatures is provided via the nsignatures argument, a list is returned in which the N-th element contains the extraction results for N signatures, as a list with the structure described above.

## Not run: 
# Load example mutational catalogues
data("counts_21breast")

# Extract 2 to 6 signatures using the NMF (multinomial) model
# (400 warmup iterations + 400 sampling iterations - use more in practice)
samples_nmf <- extract_signatures(counts_21breast, nsignatures = 2:6,
                                  model = "nmf", iter = 800)

# Extract 4 signatures using the EMu (Poisson) model
# (400 warmup iterations + 800 sampling iterations - use more in practice)
samples_emu <- extract_signatures(counts_21breast, nsignatures = 4, model = "emu",
                                  opportunities = "human-genome",
                                  iter = 1200, warmup = 400)

## End(Not run)