get_population_signatures: SignIT-Pop inference of populations and signatures

Description Usage Arguments Details Value

View source: R/get_population_signatures.R

Description

Jointly infers mutational subpopulations and their associated mutation signature exposures

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
get_population_signatures(
  mutation_table,
  reference_signatures = NULL,
  subset_signatures = TRUE,
  n_populations = NULL,
  genome = NULL,
  method = "vb",
  n_chains = 10,
  n_cores = 1,
  n_iter = 300,
  n_adapt = 200,
  prevalences = NULL
)

Arguments

mutation_table

Table of mutations, one per row. The minimum input requires the following columns:

  • total_depth: Total number of reads covering mutated locus.

  • alt_depth: Total number of mutant reads covering locus.

  • tumour_copy: Tumour copy number at the mutated locus

  • normal_copy: Normal copy number at the mutated locus

  • tumour_content: Estimated tumour content as a fraction between 0 and 1. Must be the same value throughout the whole table.

reference_signatures

Reference mutation signatures. This can either be from get_reference_signatures or a custom data frame formatted equivalently.

subset_signatures

Boolean. If TRUE (default), then subset_reference_signatures is run to pre-select a smaller subset of signatures most likely to be active in the cancer. This helps to reduce processing time and model complexity, but may bias the result.

n_populations

The number of populations to screen for. Must be an integer. If no value is provided, then a model selection step is engaged to automatically estimate the number of populations. The automatic model selection uses select_n_populations, which performs a maximum a posteriori estimate using the SignIT population model (without mutation signature inference).

genome

A BSgenome object. This is used to determine trinucleotide contexts of mutations to define mutation types. By default, uses BSgenome.Hsapiens.UCSC.hg19. To define custom mutation types, simply include a column named mutation_type in mutation_table, in which case this parameter is ignored.

method

The posterior sampling method. This is a string and can either be 'vb' for automatic variational Bayes or 'mcmc' for Hamiltonial Monte Carlo.

n_chains

Number of chains to sample. Only relevant if method == 'mcmc'.

n_cores

Number of cores for parallel sampling. By default this equals the number of chains. Only relevant if method == 'mcmc'.

n_iter

Number of sampling iterations per chain. These are distinct from adaptation iterations, so the total number of iterations will be n_iter + n_adapt. Only relevant if method == 'mcmc'.

n_adapt

Number of adaptation iterations per chain. Only relevant if method == 'mcmc'.

Details

get_population_signatures is the central function which facilitates Bayesian inference of mutational populations and signatures. This model infers a matrix of L x N parameters, where L is the number of populations and N is the number of signatures. The posterior distribution of each parameter is estimated using either automatic differentiation variational inference or Hamiltonial Monte Carlo using the vb and sampling methods respectively of the rstan package (an interface to the Stan probabilistic programming language).

Value

A list object with the posterior sampling of population signatures plus relevant input and metadata.


eyzhao/SignIT documentation built on Dec. 6, 2019, 11:45 a.m.