ActiveDriverWGS: ActiveDriverWGS is a driver discovery tool for simple somatic...

View source: R/ActiveDriverWGS.R

ActiveDriverWGSR Documentation

ActiveDriverWGS is a driver discovery tool for simple somatic mutations in cancer whole genomes

Description

ActiveDriverWGS is a driver discovery tool for simple somatic mutations in cancer whole genomes

Usage

ActiveDriverWGS(
  mutations,
  elements,
  sites = NULL,
  window_size = 50000,
  filter_hyper_MB = 30,
  recovery.dir = NULL,
  mc.cores = 1,
  ref_genome = "hg19",
  detect_depleted_mutations = FALSE
)

Arguments

mutations

A data frame containing the following columns: chr, pos1, pos2, ref, alt, patient.

chr

autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY

pos1

the start position of the mutation in base 1 coordinates

pos2

the end position of the mutation in base 1 coordinates

ref

the reference allele as a string containing the bases A, T, C, G or -

alt

the alternate allele as a string containing the bases A, T, C, G or -

patient

the patient identifier as a string

elements

A data frame containing the following columns: chr, start, end, id

chr

autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY

start

the start position of the element in base 0 coordinates (BED format)

end

the end position of the element in base 0 coordinates (BED format)

id

the element identifier - if the element contains multiple segments such as exons, each segment should be a separate row with the segment coordinates and the element identifier as id. Elements can be coding or noncoding such as exons of protein coding genes or active enhancers.

sites

A data frame containing the following columns: chr, start, end, id

chr

autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY

start

the start position of the site in base 0 coordinates (BED format)

end

the end position of the site in base 0 coordinates (BED format)

id

the identifier of the element. id's need to match with those listed in the object elements.

window_size

An integer indicating the size of the background window in base pairs that is used to establish the expected mutation rate and respective null model. The default is 50000bps

filter_hyper_MB

Hyper-mutated samples carry many passenger mutations and dilute the signal of true drivers. Samples with a rate greater than filter_hyper_MB mutations per megabase are excluded. The default is 30 mutations per megabase.

recovery.dir

The directory for storing recovery files. If the directory does not exist, ActiveDriverWGS will create the directory. If the parameter is unspecified, recovery files will not be saved. As an ActiveDriverWGS query for large datasets may be computationally heavy, specifying a recovery directory will recover previously computed results if a query is interrupted.

mc.cores

The number of cores which can be used if multiple cores are available. The default is 1.

ref_genome

The reference genome used on the analysis. The default option is "hg19", other options are "hg38", "mm9" and "mm10".

detect_depleted_mutations

if TRUE, detect elements with significantly fewer than expected mutations. FALSE by default

Value

A data frame containing the results of driver discovery containing the following columns: id, pp_element, element_muts_obs, element_muts_exp, element_enriched, pp_site, site_muts_obs, site_muts_exp, site_enriched, fdr_element, fdr_site

id

A string identifying the element of interest

pp_element

The p-value of the element

element_muts_obs

The number of patients with a mutation in the element

element_muts_exp

The expected number of patients with a mutation in the element with respect to background

element_enriched

A boolean indicating whether the element is enriched in mutations

pp_site

The p-value of the site

site_muts_obs

The number of patients with a mutation in the site

site_muts_exp

The expected number of patients with a mutation in the site with respect to element

site_enriched

A boolean indicating whether the site is enriched in mutations

fdr_element

The FDR corrected p-value of the element

fdr_site

The FDR corrected p-value of the site

has_site_mutations

A V indicates the presence of site mutations

Examples


data(cancer_genes)
data(cll_mutations)

some_genes = c("ATM", "MYD88", "NOTCH1", "SF3B1", "XPO1",
"SOCS1", "CNOT3", "DDX3X", "KMT2A", "HIF1A", "APC")

result = ActiveDriverWGS(mutations = cll_mutations,
		elements = cancer_genes[cancer_genes$id %in% some_genes,])


ActiveDriverWGS documentation built on Sept. 3, 2022, 5:05 p.m.