identify_hotspots: Identify hotspots

View source: R/identify_hotspots.R

identify_hotspotsR Documentation

Identify hotspots

Description

The function identify protein hotspot mutation residues

Usage

  identify_hotspots(mutation_dataset, gene_data,
  snp_data, min_n_muts = 5, MAF_thresh = 0.01, flanking_region = c(200, 300), 
  poisson.thr = 0.01, percentage.thr = 0.15, ratio.thr = 45, approach = "percentage")

Arguments

mutation_dataset

Object containing a table with the mutation data (e.g. TCGA point mutations mapped to protein level).

gene_data

Data frame or Txdb object containing information about Ensembl gene annotations: gene identifiers and regresentative transcript cDNA length.

snp_data

Object containing a table or vcf object with information on population SNPs.

min_n_muts

Numeric vector defining a minimum number of mutations that need to occur at the same residue. Default: 5

MAF_thresh

Numeric vector that defines Minor allele frequency threshold for considering reported mutations as population SNPs.

flanking_region

Numeric vector that defines size of a window around the mutation that will be considered. Up to two window sizes are allowed.

poisson.thr

Numeric vector that defines a treshold for the adjusted p-value. Residues with an associated p-value that is lower than the defined value are reported. Default: 0.01

percentage.thr

Number defining the fraction of mutations within the window that need to fall on a single residue in order for it to be classified as a hotspot. Default: 0.15

ratio.thr

Number defining a requirement that a number of mutations on a single residue should exceed what would be expected by chance given a background mutation rate in the window (i.e. the surrounding region). Default: 45

approach

Option to define selection criteria to use precentage.thr or ratio.thr as criterion for finding single residue mutation clusters. Options: "both", "percentage" or "ratio". Default = "percentage"

Value

An object containing information on the significant hotspots, associated Gene and protein identifiers, number of mutations, percentage of mutations within defined windows that map to the same residue and associated p-values.

Author(s)

Marija Buljan <buljan@imsb.biol.ethz.ch> Peter Blattmann <blattmann@imsb.biol.ethz.ch>

Examples


data("SnpData", package = "DominoEffect")
data("TestData", package = "DominoEffect")
data("DominoData", package = "DominoEffect")
hotspot_mutations <- identify_hotspots(mutation_dataset = TestData, 
   gene_data = DominoData, snp_data = SnpData)

peterblattmann/DominoEffect documentation built on Nov. 9, 2023, 2:39 a.m.