run: Runs SigMA: (1) calculates likelihood, cosine similarity,...

View source: R/run.R

runR Documentation

Runs SigMA: (1) calculates likelihood, cosine similarity, NNLS exposures, and likelihood of the decomposition. (2) These features are later used in multivariate analysis. (3) Based on scores a final decision on existence of the signature.

Description

Runs SigMA: (1) calculates likelihood, cosine similarity, NNLS exposures, and likelihood of the decomposition. (2) These features are later used in multivariate analysis. (3) Based on scores a final decision on existence of the signature.

Usage

run(genome_file, output_file = NULL, do_assign = T, data = "msk",
  tumor_type = "breast", do_mva = T, check_msi = F, weight_cf = F,
  lite_format = F, add_sig3 = F)

Arguments

genome_file

a csv file with snv spectra info can be created from vcf file using @make_genome_matrix() function see ?make_genome_matrix

output_file

the output file name, can be NULL in which case input file name is used and appended with "_output"

do_assign

boolean for whether a cutoff should be applied to determine the final decision or just the features should be returned

data

the options are "msk" (for a panel that is similar size to MSK-Impact panel with 410 genes), "seqcap" (for whole exome sequencing), "seqcap_probe" (64 Mb SeqCap EZ Probe v3), or "wgs" (for whole genome sequencing)

tumor_type

the options are "bladder", "bone_other" (Ewing's sarcoma or Chordoma), "breast", "crc", "eso", "gbm", "lung", "lymph", "medullo", "osteo", "ovary", "panc_ad", "panc_en", "prost", "stomach", "thy", or "uterus". The exact correspondance of these names can be found in https://github.com/parklab/SigMA

do_mva

a boolean for whether multivariate analysis should be run

check_msi

is a boolean which determines whether the user wants to identify micro-sattelite instable tumors

weight_cf

determines whether the likelihood calculation will take into account the number of tumors in each cluster when it is F the clusters get equal weights and when it's T they are weighted according to the fraction of tumors in each cluster

lite_format

saves the output in a lite format when set to true

add_sig3

should be set to T when the likelihood of Signature 3 is calculated for tumor types for which Signature 3 was not discovered by NMF in their WGS data

Examples

run(genome_file = "input_genomes.csv", 
    data = "msk",
    tumor_type = "ovary")
run(genome_file = "input_genomes.csv", 
    data = "seqcap", 
    tumor_type = "bone_other")

parklab/SigMA documentation built on Feb. 10, 2024, 6:59 p.m.