RunSomaticSignatures: Run SomaticSignatures.NMF extraction and attribution on a...

View source: R/RunSomaticSignatures.R

RunSomaticSignaturesR Documentation

Run SomaticSignatures.NMF extraction and attribution on a spectra catalog file

Description

Run SomaticSignatures.NMF extraction and attribution on a spectra catalog file

Usage

RunSomaticSignatures(
  input.catalog,
  out.dir,
  CPU.cores = NULL,
  seedNumber = 1,
  K.exact = NULL,
  K.range = NULL,
  nrun.est.K = 30,
  nrun.extract = 1,
  pConstant = NULL,
  save.diag = FALSE,
  test.only = FALSE,
  overwrite = FALSE
)

Arguments

input.catalog

File containing input spectra catalog. Columns are samples (tumors), rows are mutation types.

out.dir

Directory that will be created for the output; abort if it already exits. Log files will be in paste0(out.dir, "/tmp").

CPU.cores

Number of CPUs to use in running SomaticSignatures.NMF. For a server, 30 cores would be a good choice; while for a PC, you may only choose 2-4 cores. By default (CPU.cores = NULL), the CPU.cores would be equal to (parallel::detectCores())/2, total number of CPUs divided by 2.

seedNumber

Specify the pseudo-random seed number used to run SomaticSignatures. Setting seed can make the attribution of SomaticSignatures repeatable. Default: 1.

K.exact, K.range

K.exact is the exact value for the number of signatures active in spectra (K). Specify K.exact if you know exactly how many signatures are active in the input.catalog, which is the ICAMS-formatted spectra file.

K.range is A numeric vector (K.min,K.max) of length 2 which tell SomaticSignatures.NMF to search the best signature number active in spectra, K, in this range of Ks. Specify K.range if you don't know how many signatures are active in the input.catalog.

WARNING: You must specify only one of K.exact or K.range!

Default: NULL

nrun.est.K

Number of NMF runs for each possible number of signature. This is used in the step to estimate the most plausible number of signatures in input spectra catalog.

nrun.extract

number of NMF runs for extracting signatures and inferring exposures.

pConstant

A small positive value (a.k.a. pseudocount) to add to every entry in the input.catalog. Specify a value ONLY if an "non-conformable arrays error" is raised.

save.diag

Save object of class MutationalSignatures which stores full results from multiple NMF decomposition runs into files below:

  • assess.K.pdf RSS and explained variance at each K in K.range. Used for manual selection of number of signatures (K).

  • assess.K.Rdata Full results for each K in K.range. Used for diagnosing goodness of fit and stability.

  • extract.given.K.Rdata Full results when K is specified by K.exact or selected by elbow-point method. Used for diagnosing accuracy of signature extraction.

Set to TRUE for diagnostic purposes, set to FALSE for cleaner results.

test.only

If TRUE, only analyze the first 10 columns read in from input.catalog. Default: FALSE

overwrite

If TRUE, overwrite existing output. Default: FALSE

Details

SomaticSignatures.NMF used approach in Hutchins et al. (2008) to estimate K: it selects the first inflection point of residual sum of squares (RSS) function by finding the smallest K where the second derivate of RSS at its neighbouring Ks have opposite signs.

This requires calculation of second derivative of residual sum of squares (RSS) at >2 integers, and thus requires at least 3 Ks to be assessed.

Value

A list contains:

  • $signature extracted signatures,

  • $exposure inferred exposures,

of SomaticSignatures.NMF, invisibly.

References

http://dx.doi.org/10.1093/bioinformatics/btn526


WuyangFF95/SynSigRun documentation built on Oct. 7, 2022, 1:16 p.m.