wtest.high: W-test for High Order Interaction Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/wtest.high.R

Description

This function performs the W-test to calculate high-order interactions in case-control studies for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For high-order interaction calculation, the user has 3 options: (1) calculate W-test of a set of SNPs, (2) calculate high-order interaction for a list of variables, which p-values are smaller than a threshold (input.pval); (3) calculate high-order interaction exhaustively for all variables. Output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval) will be returned.

Usage

1
2
3
wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
  hf.high.order = "default.high", which.marker = NULL, output.pval = NULL,
  sort = TRUE, input.pval = 0.1, input.poolsize = 10)

Arguments

data

a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).

y

a numeric vector of 0 or 1, or a factor variable with two levels.

w.order

an integer value, indicating the order of high-way interactions. For example, w.order = 3 for three-way interaction analysis.

hf1

h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3.

hf.high.order

h and f values to calculate high-order interactions, organized as a matrix, with columns (k, h, f), where k is the number of genotype combinations of a set of SNPs.

which.marker

a numeric vector indicating the column index of a set of SNPs to calculate. Default which.marker = NULL gives an exhaustively high-order interaction calculation.

output.pval

a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the output.pval.

sort

a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.

input.pval

a p-value threshold to select markers for high-order interaction calculation, used only when w.order > 2. When specified, only markers with main effect p-value smaller than input.pval will be passed to interaction effect calculation. Default = 0.10. Set input.pval = NULL or 1 for exhaustive calculation.

input.poolsize

an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in high-order interaction calculation, used only when w.order > 2. When specified, the function selects top input.poolsize number of variables to calculate interactions. It can be used separately or jointly with input.pval, whichever gives smaller input pool size. Default = 10. Set input.poolsize = NULL for exhaustive calculation. It can be useful for data exploration, when there are a large number of variables with extremely small main effect p-values.

Details

W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.

When w.order > 2, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions. This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001 for less output, or input.pval=1 or NULL for exhaustive high-order interaction calculation. Another optional filter is input.poolsize. It will select the top input.poolsize number of variables, ranked by p-values, to calculate high-order interactions. When used together with input.pval, the algorithm selects the smaller set in the high-order calculation.

Value

An object "wtest" containing:

order

the "w.order" specified.

results

When order > 2 and which.marker = NULL, the test results include: (information of a set) [SNPs name, W-value, k, p-value]; (Information of the first variable in the set) [W-value, k, p-value]; (Information of the second variable in the set) [W-value, k, p-value] ...

hf1

The h and f values used in main effect calculation.

hf2

The h and f values used in high-order interaction calculation.

Author(s)

Rui Sun, Maggie Haitian Wang

References

Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.

See Also

hf, w.diagnosis, w.qqplot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data(diabetes.geno)
data(phenotype1)

## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf.high <- hf(data = diabetes.geno, w.order = 3, B = 30, n.marker = 10)

## Step 2. W-test Calculation
w1 <- wtest.high(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(diabetes.geno, phenotype1, w.order = 3, input.pval = 0.3,
            input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.set <- wtest.high(diabetes.geno, phenotype1, w.order = 3, which.marker = c(10,13,20),
            hf.high.order = hf.high)

wtest documentation built on July 5, 2018, 1:01 a.m.