wtest.high: W-test for High Order Interaction Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/wtest.high.R


This function performs the W-test to calculate high-order interactions in case-control studies for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For high-order interaction calculation, the user has 3 options: (1) calculate W-test of a set of SNPs, (2) calculate high-order interaction for a list of variables, which p-values are smaller than a threshold (input.pval); (3) calculate high-order interaction exhaustively for all variables. Output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval) will be returned.


wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
  hf.high.order = "default.high", which.marker = NULL, output.pval = NULL,
  sort = TRUE, input.pval = 0.1, input.poolsize = 10)



a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).


a numeric vector of 0 or 1, or a factor variable with two levels.


an integer value, indicating the order of high-way interactions. For example, w.order = 3 for three-way interaction analysis.


h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3.


h and f values to calculate high-order interactions, organized as a matrix, with columns (k, h, f), where k is the number of genotype combinations of a set of SNPs.


a numeric vector indicating the column index of a set of SNPs to calculate. Default which.marker = NULL gives an exhaustively high-order interaction calculation.


a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the output.pval.


a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.


a p-value threshold to select markers for high-order interaction calculation, used only when w.order > 2. When specified, only markers with main effect p-value smaller than input.pval will be passed to interaction effect calculation. Default = 0.10. Set input.pval = NULL or 1 for exhaustive calculation.


an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in high-order interaction calculation, used only when w.order > 2. When specified, the function selects top input.poolsize number of variables to calculate interactions. It can be used separately or jointly with input.pval, whichever gives smaller input pool size. Default = 10. Set input.poolsize = NULL for exhaustive calculation. It can be useful for data exploration, when there are a large number of variables with extremely small main effect p-values.


W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.

When w.order > 2, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions. This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001 for less output, or input.pval=1 or NULL for exhaustive high-order interaction calculation. Another optional filter is input.poolsize. It will select the top input.poolsize number of variables, ranked by p-values, to calculate high-order interactions. When used together with input.pval, the algorithm selects the smaller set in the high-order calculation.


An object "wtest" containing:


the "w.order" specified.


When order > 2 and which.marker = NULL, the test results include: (information of a set) [SNPs name, W-value, k, p-value]; (Information of the first variable in the set) [W-value, k, p-value]; (Information of the second variable in the set) [W-value, k, p-value] ...


The h and f values used in main effect calculation.


The h and f values used in high-order interaction calculation.


Rui Sun, Maggie Haitian Wang


Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.

See Also

hf, w.diagnosis, w.qqplot



## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10)

## Step 2. W-test Calculation
w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3,
            input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20),
            hf.high.order = hf.high)

wtest documentation built on July 5, 2018, 1:01 a.m.