BIN_test: Split data into bins and carry out a two-sample...
In adamwaring/ClusterBurden: Rare variant association utilising burden and positional information.

Description Usage Arguments Details Value Author(s) Examples

Split data into bins and carry out a two-sample goodness-of-fit test Calculate a p-value for positional differences of rare missense-variant residue positions between cases and controls.

BIN_test(
  case_residues,
  control_residues,
  case_coverage = NULL,
  control_coverage = NULL,
  cov_threshold = 0.5,
  pval = T,
  method = "mann-wald",
  nbins = NULL,
  plot_resids = F
)

`case_residues`	vector of case variant residue positions
`control_residues`	vector of control variant residue positions
`case_coverage`	optional coverage data for cases in format: data.table(protein_position, over_10)
`control_coverage`	optional coverage data for controls in format: data.table(protein_position, over_10)
`cov_threshold`	threshold at which to exclude a residue from the analysis (choose 0 to keep all residues)
`pval`	return only p-value or return chi-squared test output?
`method`	method to bin data either mann-wald or nbins
`nbins`	number of bins to use if method == "nbins"
`plot_resids`	should chi-squared residuals be plotted? Defaults to False

The function takes a vector of case and control missense-variant residue positions (aggregated over a protein-coding-region) as input and returns a p-value representing the significance of variant clustering within the gene. The linear sequence of the protein is split into 'bins' and the the counts for variant within each bin for each cohort are used to construct a kx2 contigency table where k is the number of bins and 2 is for the two cohorts: cases and controls. The binning method is either: - "mann_wald" where the number of bins k is determined by the total number of observed variants n by the equation k ~ n^(2/5) - "nbins" where the user selects a specific number of bins (reasonable values here would be ~10-20 bins) Setting "plot_resids" to true allows the residuals for each cell in the kx2 contigency table to be plotted - this allows the user to determine which cells (protein regions) contribute towards the significance of the test.

When coverage files are supplied then regions with a 10X coverage below "cov_threshold" (default=0.5) are exluded from the analysis. For the remaining regions, cell counts are adjusted by the reciprocal of the mean coverage across the bin.

Returns an object of class htest or p.value depending on value of argument "pval=?"

Adam Waring - adam.waring@msdtc.ox.ac.uk

# The essential inputs are case_residues and control_residues

# Example 1: Simulated NULL data
# Bin by the Mann-Wald heuristic; n ^ (2/5) where n = length(case_residues) + length(control_residues)
# simulate case-control residue positions from the same distribution

nresidues = 1000 # length of the protein
probs = rexp(nresidues)^2 # probability of a missense variant at each residue

case_residues = sample(1:nresidues, 100, rep=T, probs)
control_residues = sample(1:nresidues, 100, rep=T, probs)

BIN_test(case_residues, control_residues)

# Example 2: Simulated DISEASE data
# simulate case-control residue positions from different distributions

nresidues = 1000 # length of the protein
probs = rexp(nresidues)^2
case_probs = probs * rep(c(1, 3, 1), c(200, 200, 600))
control_probs = probs * rep(c(2, 1, 2), c(200, 200, 600))

case_residues = sample(1:nresidues, 100, rep=T, case_probs)
control_residues = sample(1:nresidues, 100, rep=T, control_probs)

plot_distribs(case_residues, control_residues)

BIN_test(case_residues, control_residues, plot_resids = T)

adamwaring/ClusterBurden documentation built on July 29, 2020, 9:50 p.m.

adamwaring/ClusterBurden index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adamwaring/ClusterBurden
Rare variant association utilising burden and positional information.

BIN_test: Split data into bins and carry out a two-sample...
In adamwaring/ClusterBurden: Rare variant association utilising burden and positional information.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to BIN_test in adamwaring/ClusterBurden...

R Package Documentation

Browse R Packages

We want your feedback!

adamwaring/ClusterBurden Rare variant association utilising burden and positional information.

BIN_test: Split data into bins and carry out a two-sample... In adamwaring/ClusterBurden: Rare variant association utilising burden and positional information.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to BIN_test in adamwaring/ClusterBurden...

R Package Documentation

Browse R Packages

We want your feedback!

adamwaring/ClusterBurden
Rare variant association utilising burden and positional information.

BIN_test: Split data into bins and carry out a two-sample...
In adamwaring/ClusterBurden: Rare variant association utilising burden and positional information.