SignatureExtraction: Mutational Signatures Extraction

View source: R/SignatureExtractionLib.R

SignatureExtractionR Documentation

Mutational Signatures Extraction

Description

Perform signature extraction, by applying NMF to the input matrix. Multiple NMF runs and bootstrapping is used for robustness, followed by clustering of the solutions. A range of number of signatures to be used is required.

Usage

SignatureExtraction(
  cat,
  outFilePath,
  matrix_of_fixed_signatures = NULL,
  blacklist = c(),
  nrepeats = 10,
  nboots = 20,
  clusteringMethod = "MC",
  completeLinkageFlag = FALSE,
  useMaxMatching = TRUE,
  filterBestOfEachBootstrap = TRUE,
  filterBest_RTOL = 0.001,
  filterBest_nmaxtokeep = 10,
  nparallel = 1,
  nsig = c(3:15),
  mut_thr = 0,
  type_of_extraction = "subs",
  project = "extraction",
  parallel = FALSE,
  nmfmethod = "brunet",
  removeDuplicatesInCatalogue = FALSE,
  removeDuplicatesThreshold = 0.98,
  normaliseCatalogue = FALSE,
  plotCatalogue = FALSE,
  plotResultsFromAllClusteringMethods = TRUE
)

Arguments

cat

matrix with samples as columns and channels as rows

outFilePath

path were the extraction output files should go. Remember to add "/" at the end of the path

matrix_of_fixed_signatures

matrix with known signatures as columns and channels as rows. Used for partial extraction with NNLM package, with Lee KLD (brunet) only. If NULL, NMF package is used instead and different nmf methods can be used.

blacklist

list of samples (column names) to ignore

nrepeats

how many runs for each bootstrap (if filterBestOfEachBootstrap=TRUE with default params, only at most 10 runs within 0.1 percent of best will be considered, so nrepeats should be at least 10)

nboots

how many bootstrapped catalogues to use

clusteringMethod

choose among "HC","PAM","MC", hierarchical clustering (HC), partitioning around the medoids (PAM) and matched clustering (MC)

completeLinkageFlag

if clusteringMethod="HC", use complete linkage instead of default average linkage

useMaxMatching

if clusteringMethod="MC", use the assignment problem algorithm (match with max similarity) instead of the stable matching algorithm (any stable match)

filterBestOfEachBootstrap

if TRUE only at most filterBest_nmaxtokeep of the nrepeats runs that are within filterBest_RTOL*best from the best are kept

filterBest_RTOL

realtive tolerace from best fit to consider a run as good as the best, RTOL=0.001 is recommended

filterBest_nmaxtokeep

max number of runs that should be kept that are within the relative tolerance from the best

nparallel

how many processing units to use

nsig

list of number of signatures to try

mut_thr

threshold of mutations to remove empty/almost empty rows and columns

type_of_extraction

choose among "subs","rearr","generic","dnv"

project

give a name to your project

parallel

set to TRUE to use parallel computation (Recommended)

nmfmethod

choose among "brunet","lee","nsNMF", this choice will be passed to the NMF::nmf function

removeDuplicatesInCatalogue

remove 0.99 cos sim similar samples

normaliseCatalogue

scale samples to sum to 1

plotCatalogue

also plot the catalogue, this may crash the library if the catalogue is too big, should work up to ~300 samples

plotResultsFromAllClusteringMethods

if TRUE, all clustering methods are used and results are reported and plotted for all of them. If FALSE, only the requested clustering is reported

Value

result files will be available in the outFilePath directory

Examples

  n_row <- 96
  n_col <- 50
  rnd_matrix <- round(matrix(runif(n_row*n_col,min = 0,max = 50),nrow = n_row,ncol = n_col))
  colnames(rnd_matrix) <- paste0("C",1:n_col)
  row.names(rnd_matrix) <- paste0("R",1:n_row)
  SignatureExtraction(cat = rnd_matrix,
                      outFilePath = paste0("extraction_test_subs/"),
                      nrepeats = 10,
                      nboots = 2,
                      nparallel = 2,
                      nsig = 2:3,
                      mut_thr = 0,
                      type_of_extraction = "subs",
                      project = "test",
                      parallel = TRUE,
                      nmfmethod = "brunet")

Nik-Zainal-Group/signature.tools.lib documentation built on April 13, 2025, 5:50 p.m.