RM2_downsample: Evaluating differential mutation rates across classes of...

View source: R/running_glm.R

RM2_downsampleR Documentation

Evaluating differential mutation rates across classes of sites with downsampling

Description

RM2_downsample() is a wrapper that first performs downsampling then calls RM2. The median index is selected by p-value for total mutations (mut_class_columns=NA) and the corresponding values are returned.

Usage

RM2_downsample(
  maf,
  sites,
  mut_class_columns = NA,
  cofactor_column = NA,
  window_size = 100,
  n_min_mut = 100,
  n_bin = 10,
  n_sites_sampled,
  n_iterations = 100
)

Arguments

maf

Data frame of mutations

chr

autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY

start

the start position of the mutation in base 1 coordinates

end

the end position of the mutation in base 1 coordinates

ref

the reference allele as a string containing the bases A, T, C or G

alt

the alternate allele as a string containing the bases A, T, C or G

mut_trinuc

trinucleotide context - where middle is C or T - with alternate allele

mut_strand

character indicating Watson (w) or Crick (c)

ref_alt

character indicating single-base substitution

sites

Data frame

chr

autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY

start

the start position of the mutation in base 0 coordinates

end

the end position of the mutation in base 0 coordinates

mut_class_columns

Character corresponding to the column(s) of mutation classes for grouped analysis

cofactor_column

Character corresponding to the column of cofactors

window_size

Integer indicating the half-width of sites and flanking regions (added to left and right for full width) (default 100)

n_min_mut

Integer indicating the minimum number of mutations required to perform analysis (default 100)

n_bin

Integer indicating the number of megabase bins to use (default 10)

n_sites_sampled

Integer indicating the number of sites to sample

n_iterations

Integer indicating how many times to repeat the sampling procedure (default 100)

Value

Data frame containing the regression estimates and likelihood ratio test output with the following columns: mut_type, pp, this_coef, obs_mut, exp_mut, exp_mut_lo, exp_mut_hi, fc, n_sites_tested

mut_type

A string identifying the mutation class

pp

The p-value from the likelihood ratio test

this_coef

The coefficient from is_site

obs_mut

The total number of observed mutations of that class

exp_mut

The expected number of mutations determined by the model

exp_mut_lo

Lower bound of 95% confidence interval

exp_mut_hi

Upper bound of 95% confidence interval

fc

Observed mutations divided by expected mutations

pp_cofac

The p-value from the likelihood ratio test of site:cofactor interaction

this_coef_cofac

The coefficient from the site:cofactor interaction term

n_sites_tested

The number of sites that were tested - all sites if no downsampling


reimandlab/RM2 documentation built on Aug. 13, 2022, 12:22 p.m.