Description Usage Arguments Details Value Author(s) Examples
View source: R/burden_test_WES.r
Calculate Fisher's-exact p-values across the whole-exome
1 2 3 4 5 6 7 8 9 10 11 12 | burden_test_WES(
cases,
controls,
cases_ss = NULL,
controls_ss = NULL,
case_coverage = NULL,
control_coverage = NULL,
cov_threshold = 0.5,
alternative = "greater",
covstats = F,
messages = T
)
|
cases |
case data in format: data.table(aff, symbol, protein_position, ac) |
controls |
control data in format: data.table(aff, symbol, protein_position, ac) |
cases_ss |
sample sizes for cases either a scalar for all genes or a data.table with two columns: symbol, ss |
controls_ss |
sample sizes for controls either a scalar for all genes or a data.table with two columns: symbol, ss |
case_coverage |
optional coverage data for cases in format: data.table(symbol, protein_position, over_10) |
control_coverage |
optional coverage data for controls in format: data.table(symbol, protein_position, over_10) |
cov_threshold |
threshold at which to exclude a residue position from the analysis (choose 0 to keep all residues) |
alternative |
seearch for excess variant counts in cases only? (or "two-sided") |
covstats |
include additional coverage information as a column in results? |
messages |
print messages to the terminal (e.g. how many variants removed by coverage) |
Calculates a Fisher's-exact test with coverage control across an exome set. Yates continuity correction used for 0 counts. Residue positions with mean 10X coverage less than "cov_threshold" excluded from the analysis.
Returns a data.table: symbol, p-value, covstats (optional)
Adam Waring - adam.waring@msdtc.ox.ac.uk
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | library(ClusterBurden)
N = 10000 # dataset size
PL = 1000 # protein length
n_genes = 10
ss = 10000 # sample size
genes = paste0("gene", 1:n_genes) #gene names
# example case dataset
cases = data.table(aff=1,
symbol = sample(genes, N, rep=T),
protein_position = sample(PL, N, rep=T),
ac = ceiling(rexp(N, 1.2)))
# example control dataset
controls = data.table(aff=0,
symbol = sample(genes, N, rep=T),
protein_position = sample(PL, N, rep=T),
ac = ceiling(rexp(N, 1.2)))
# provide cases and control sample sizes
cases_ss = controls_ss = ss
# p-values without coverage control
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss)
# example coverage files
case_coverage = data.table(symbol=rep(genes, each=PL),
protein_position=rep(1:PL, n_genes),
over_10=1-rexp(n_genes*PL, 50))
control_coverage = data.table(symbol=rep(genes, each=PL),
protein_position=rep(1:PL, n_genes),
over_10=1-rexp(n_genes*PL, 50))
# p-values with coverage control
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss, case_coverage, control_coverage)
# example format for genes with different sample sizes
cases_ss = controls_ss = data.table(symbol=genes, ss=c(rep(ss, 9), ss/2))
# p-values when genes have different sample sizes
pvals = burden_test_WES(cases, controls, cases_ss, controls_ss, case_coverage, control_coverage)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.