Description Usage Arguments Value Examples
View source: R/sigfit_estimation.R
fit_extract_signatures
performs MCMC sampling to simultaneously fit a set of 'fixed'
signatures to a collection of mutational catalogues (as in fit_signatures
) and
extract a number of 'additional' signatures from the catalogues (as in
extract_signatures
). Four models of signatures are available: multinomial, Poisson,
normal and negative binomial. The normal model can be used when counts
contains continuous
(non-integer) values, while the negative binomial model is a more noise-robust version of the
Poisson model. (However, the use of the negative binomial model for signature extraction is
discouraged due to its inefficiency.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
counts |
Numeric matrix of observed mutation counts, with one row per sample and one column per mutation type. |
signatures |
'Fixed' mutational signatures to be fitted; either a numeric matrix with one
row per signature and one column per mutation type, or a list of matrices generated via
|
num_extra_sigs |
Numeric indicating the number of 'additional' signatures to be extracted. |
model |
Name of the model to sample from. Admits character values |
opportunities |
Numeric matrix of optional mutational opportunities, with one row per sample
and one column per mutation type. It also admits character values |
sig_prior |
Numeric matrix with one row per 'additional' signature and one column per mutation type, to be used as the Dirichlet priors for the additional signatures to be extracted. Default priors are uniform. |
exp_prior |
Numeric matrix with one row per sample and one column per signature (including both 'fixed' and 'additional' signatures), to be used as the Dirichlet priors for the signature exposures. Default priors are uniform. |
dpp |
Logical indicating whether to use a Dirichlet process prior to infer the number of
mutational signatures (default is |
dpp_conc |
Numeric indicating the value of the concentration parameter for the Dirichlet
process prior (default is 1). Only used if |
stanfunc |
Character indicating the choice of rstan inference strategy.
Admits values |
chains |
Integer indicating the number of chains used for MCMC (default is 1). The use of
multiple chains for signature extraction is discouraged, as it can result in an inference problem
called 'label switching'. This value is passed to |
... |
Additional arguments to be passed to the sampling function (by default,
|
A list with two elements:
$data
: list containing the input data supplied to the model.
$result
: object of class stanfit, containing the output MCMC samples,
as well as information about the model and sampling process.
The model parameters (such as signatures and exposures) can be extracted from this
object using retrieve_pars
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ## Not run:
# Simulate two catalogues using signatures 1, 4, 5, 7, with
# proportions 4:2:3:1 and 2:3:4:1, respectively
data("cosmic_signatures_v2")
probs <- rbind(c(0.4, 0.2, 0.3, 0.1) %*% cosmic_signatures_v2[c(1, 4, 5, 7), ],
c(0.2, 0.3, 0.4, 0.1) %*% cosmic_signatures_v2[c(1, 4, 5, 7), ])
mutations <- rbind(t(rmultinom(1, 20000, probs[1, ])),
t(rmultinom(1, 20000, probs[2, ])))
# Assuming that we do not know signature 7 a priori, but we know the others
# to be present, extract 1 signature while fitting signatures 1, 4 and 5.
# (400 warmup iterations + 400 sampling iterations - use more in practice)
mcmc_samples <- fit_extract_signatures(mutations, cosmic_signatures_v2[c(1, 4, 5), ],
num_extra_sigs = 1, model = "nmf", iter = 800)
# Plot original and extracted signature 7
extr_sigs <- retrieve_pars(mcmc_samples, "signatures")
plot_spectrum(cosmic_signatures_v2[7, ], pdf_path = "COSMIC_Sig7.pdf", name="COSMIC sig. 7")
plot_spectrum(extr_sigs, pdf_path = "Extracted_Sigs.pdf")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.