# tsum_test: Generate T sum statistics and p-values from simulation. In bahlolab/exSTRa: Expanded STR algorithm: detecting expansions in Illumina sequencing data

## Description

When applied to an exstra_score object, T sum statistics are calculated as described in Tankard et al. May also be applied on a pre-existing exstra_tsum that will regenerate the values.

## Usage

 ```1 2 3 4 5 6 7``` ```tsum_test(strscore, trim = ifelse(case_control, trim.cc, trim.all), trim.all = 0.15, trim.cc = 0, min.quant = 0.5, give.pvalue = TRUE, B = 999, correction = c("bf", "loci", "samples", "uncorrected"), alpha = 0.05, case_control = FALSE, early_stop = TRUE, early_A = 0.25, early_stop_min = 50, parallel = FALSE, cluster_n = NULL, cluster = NULL, keep.sim.tsum = FALSE) ```

## Arguments

 `strscore` An exstra_score object. `trim` Trim this proportion of data points at each quantile level (rounded up). Must be at least 0 and less than 0.5, but values close to 0.5 may remove all samples and hence result in an error. `trim.cc` Trim value used in case-control analysis. Default of 0. `min.quant` Only quantiles above this value are used in constructing the statistic. `give.pvalue` Whether to calculate the p-value. As this can be slow it can be useful to turn off if only the t sum statistics are required. `B` Number of simulations in calculating null distributions. The denominator will be B + 1, hence values of B = 10^i - 1 will result in p-values that are decimal fractions. `correction` Correction method of p_value() function. `alpha` Signficance level of p_value() function. `case_control` If TRUE, only calculate for samples designated cases. Otherwise all samples are used to calculate the background distribution. `early_stop` Simulation may use less replicates when all p-values are large, controlled with early_A. `early_A` Simulations may stop when p.value.sd < early_A * min(p.value). Checked approximately when the number of simulations has doubled. `early_stop_min` Minimum number of simulations to run before early termination. `parallel` Use the parallel package when simulating the distribution, creating the required cluster. If cluster is specified then this option makes no difference. `cluster_n` If parallel is TRUE, then the number of nodes in the cluster is automatically set as 1 less than those available on your machine. (but never less than 1). This option allows manual setting of the number of nodes, either less to free up other resources, or more to maximize available resources. If cluster is specified then this option makes no difference. `cluster` A cluster object from the parallel package. Use if you wish to set up the cluster yourself or reuse an existing cluster. `keep.sim.tsum` For inspection of simulations. If TRUE, keep all simulation Tsum statistics in output\$xecs (default FALSE).

## Value

An exstra_tsum object with T statistics and p-values (if calculated).

## References

Rick M. Tankard, Martin B. Delatycki, Paul J. Lockhart, Melanie Bahlo. Detecting known repeat expansions with standard protocol next generation sequencing, towards developing a single screening test for neurological repeat expansion disorders. bioRxiv 157792; doi: https://doi.org/10.1101/157792

`tsum_p_value_summary`
 ```1 2 3 4 5 6 7 8``` ```exp_test <- tsum_test(exstra_wgs_pcr_2[c("HD", "SCA6")], B = 50) exp_test ## Not run: exp_test_parallel <- tsum_test(exstra_wgs_pcr_2[c("HD", "SCA6")], parallel = TRUE, B = 999) exp_test_parallel ## End(Not run) ```