burden_test_WES: Calculate Fisher's-exact p-values across the whole-exome

Description Usage Arguments Details Value Author(s) Examples

View source: R/burden_test_WES.r

Description

Calculate Fisher's-exact p-values across the whole-exome

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
burden_test_WES(
  cases,
  controls,
  cases_ss = NULL,
  controls_ss = NULL,
  case_coverage = NULL,
  control_coverage = NULL,
  cov_threshold = 0.5,
  alternative = "greater",
  covstats = F,
  messages = T
)

Arguments

cases

case data in format: data.table(aff, symbol, protein_position, ac)

controls

control data in format: data.table(aff, symbol, protein_position, ac)

cases_ss

sample sizes for cases either a scalar for all genes or a data.table with two columns: symbol, ss

controls_ss

sample sizes for controls either a scalar for all genes or a data.table with two columns: symbol, ss

case_coverage

optional coverage data for cases in format: data.table(symbol, protein_position, over_10)

control_coverage

optional coverage data for controls in format: data.table(symbol, protein_position, over_10)

cov_threshold

threshold at which to exclude a residue position from the analysis (choose 0 to keep all residues)

alternative

seearch for excess variant counts in cases only? (or "two-sided")

covstats

include additional coverage information as a column in results?

messages

print messages to the terminal (e.g. how many variants removed by coverage)

Details

Calculates a Fisher's-exact test with coverage control across an exome set. Yates continuity correction used for 0 counts. Residue positions with mean 10X coverage less than "cov_threshold" excluded from the analysis.

Value

Returns a data.table: symbol, p-value, covstats (optional)

Author(s)

Adam Waring - adam.waring@msdtc.ox.ac.uk

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
library(ClusterBurden)

N = 10000 # dataset size
PL = 1000 # protein length
n_genes = 10
ss = 10000 # sample size
genes = paste0("gene", 1:n_genes) #gene names

# example case dataset
cases = data.table(aff=1,
                   symbol = sample(genes, N, rep=T),
                   protein_position = sample(PL, N, rep=T),
                   ac = ceiling(rexp(N, 1.2)))

# example control dataset
controls = data.table(aff=0,
                      symbol = sample(genes, N, rep=T),
                      protein_position = sample(PL, N, rep=T),
                      ac = ceiling(rexp(N, 1.2)))

# provide cases and control sample sizes
cases_ss = controls_ss = ss

# p-values without coverage control
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss)

# example coverage files
case_coverage = data.table(symbol=rep(genes, each=PL),
                           protein_position=rep(1:PL, n_genes),
                           over_10=1-rexp(n_genes*PL, 50))

control_coverage = data.table(symbol=rep(genes, each=PL),
                              protein_position=rep(1:PL, n_genes),
                              over_10=1-rexp(n_genes*PL, 50))

# p-values with coverage control
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss, case_coverage, control_coverage)

# example format for genes with different sample sizes
cases_ss = controls_ss = data.table(symbol=genes, ss=c(rep(ss, 9), ss/2))

# p-values when genes have different sample sizes
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss, case_coverage, control_coverage)

adamwaring/ClusterBurden documentation built on July 29, 2020, 9:50 p.m.