wtest.high: W-test for High Order Interaction Analysis
In wtest: The W-Test for Genetic Interactions Testing

Description Usage Arguments Details Value Author(s) References See Also Examples

This function performs the W-test to calculate high-order interactions in case-control studies for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For high-order interaction calculation, the user has 3 options: (1) calculate W-test of a set of SNPs, (2) calculate high-order interaction for a list of variables, which p-values are smaller than a threshold (input.pval); (3) calculate high-order interaction exhaustively for all variables. Output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval) will be returned.

wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
  hf.high.order = "default.high", which.marker = NULL,
  output.pval = NULL, sort = TRUE, input.pval = 0.1,
  input.poolsize = 10)

`data`	a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).
`y`	a numeric vector of 0 or 1.
`w.order`	an integer value, indicating the order of high-way interactions. For example, `w.order` = 3 for three-way interaction analysis.
`hf1`	h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3.
`hf.high.order`	h and f values to calculate high-order interactions, organized as a matrix, with columns (k, h, f), where k is the number of genotype combinations of a set of SNPs.
`which.marker`	a numeric vector indicating the column index of a set of SNPs to calculate. Default `which.marker` = NULL gives an exhaustively high-order interaction calculation.
`output.pval`	a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the `output.pval`.
`sort`	a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.
`input.pval`	a p-value threshold to select markers for high-order interaction calculation, used only when `w.order` > 2. When specified, only markers with main effect p-value smaller than `input.pval` will be passed to interaction effect calculation. Default = 0.10. Set `input.pval` = NULL or 1 for exhaustive calculation.
`input.poolsize`	an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in high-order interaction calculation, used only when `w.order` > 2. When specified, the function selects top `input.poolsize` number of variables to calculate interactions. It can be used separately or jointly with `input.pval`, whichever gives smaller input pool size. Default = 10. Set `input.poolsize` = NULL for exhaustive calculation. It can be useful for data exploration, when there are a large number of variables with extremely small main effect p-values.

W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.

When w.order > 2, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions. This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001 for less output, or input.pval=1 or NULL for exhaustive high-order interaction calculation. Another optional filter is input.poolsize. It will select the top input.poolsize number of variables, ranked by p-values, to calculate high-order interactions. When used together with input.pval, the algorithm selects the smaller set in the high-order calculation.

An object "wtest" containing:

`order`	the "w.order" specified.
`results`	When order > 2 and which.marker = NULL, the test results include: (information of a set) [SNPs name, W-value, k, p-value]; (Information of the first variable in the set) [W-value, k, p-value]; (Information of the second variable in the set) [W-value, k, p-value] ...
`hf1`	The h and f values used in main effect calculation.
`hf2`	The h and f values used in high-order interaction calculation.

Rui Sun, Maggie Haitian Wang

Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.

hf, w.diagnosis, w.qqplot

data(diabetes.geno)
data(phenotype1)

## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10)

## Step 2. W-test Calculation
w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3,
            input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20),
            hf.high.order = hf.high)