amkat: Adaptive Multivariate Kernel-based Association Test
In brianpatrickneal/AMKAT: AMKAT: An Adaptive Multivariate Kernel-based Association Test

View source: R/main.R

amkat

R Documentation

Adaptive Multivariate Kernel-based Association Test

Description

Tests nonparametrically for joint association between the columns of x and those of y while optionally adjusting for covariates, accommodating high-dimensional data for x without distributional assumptions using a kernel-based test statistic. Includes supervised methods for feature selection and kernel selection.

Usage

amkat(y, x, covariates = NULL, filter_x = TRUE,
      candidate_kernels = c("lin", "quad", "gau", "exp"),
      num_permutations = 1000,
      p_value_adjustment = "pseudocount",
      num_test_statistics = 1,
      output_test_statistics = TRUE,
      output_selected_kernels = TRUE,
      output_selected_x_columns = TRUE,
      output_null_residuals = TRUE,
      output_p_value_only = FALSE)

Arguments

`y`	a numeric matrix containing data on the dependent variables, with observations indexed by row.
`x`	a numeric matrix with the same number of rows as `y` containing data on the independent variables.
`covariates`	an optional numeric matrix with the same number of rows as `y` containing data on the covariates. The number of columns cannot exceed `nrow(y) - 2`.
`filter_x`	logical, indicating whether to apply AMKAT's permutation-based filter method for feature selection. Requires `x` to have at least two columns.
`candidate_kernels`	an optional character vector specifying the kernel functions to be considered during kernel selection. For a list of valid character strings and the kernel function corresponding to each, use `listAmkatKernelFunctions()`.
`num_permutations`	an optional strictly-positive integer specifying the number of test statistics from the permutation null distribution that are to be used in approximating the P-value for the test.
`p_value_adjustment`	an optional character string specifying the method of adjustment to apply to the permutation-based P-value for the test. Acceptable values are `"pseudocount"`, `"floor"` and `"none"`.
`num_test_statistics`	an optional strictly-positive integer indicating the number of observed test statistic values to generate. If greater than 1, the mean value of the statistics is used as the observed value for testing; this stabilizes variation in the test statistic introduced by AMKAT's permutation-based filter method for feature selection. Has no effect if `filter_x = FALSE`.
`output_test_statistics`	logical, indicating whether output should include the values of all observed test statistics and permutation test statistics generated during testing. Has no effect if `output_p_value_only = TRUE`.
`output_selected_kernels`	logical, indicating whether output should include the kernel function selected for each column of `y`. Has no effect if `output_p_value_only = TRUE`.
`output_selected_x_columns`	logical, indicating whether output should include a record of which columns of `x` were selected for use in testing by AMKAT's filter method. Has no effect if `filter_x = FALSE` or if `output_p_value_only = TRUE`.
`output_null_residuals`	logical, indicating whether output should include the residuals and standard errors from the fitted null model (after covariate adjustment, if applicable). Has no effect if `output_p_value_only = TRUE`.
`output_p_value_only`	logical; if `TRUE`, the function returns only the P-value for the test rather than a list of results.

Details

A minimum requirement of 16 observations is enforced to avoid NaN values when estimating the asymptotic variance of the test statistic.

The kernel-based AMKAT test statistic incorporates individual kernel selection for each variable in y using a maximum statistic method. The kernel functions considered during kernel selection can be specified via the argument candidate_kernels. The function listAmkatKernelFunctions() retrieves the strings used to identify the available kernel functions, which include the Linear, Quadratic, Gaussian, Exponential and Identical-By-State (IBS) kernels. By default, all of these are considered except for the IBS kernel. ?listAmkatKernelFunctions provides details on the individual kernel functions.

Prior to performing kernel selection and computing the test statistic, feature selection is performed using a filter method in which permuted data is compared to original data using tests of Spearman's Rho between columns of x and y.

An option is included to repeat the feature selection and kernel selection process in order to generate multiple test statistic values and use their mean value for testing, in order to reduce variation in the test statistic introduced by the feature selection method.

The P-value for the test is computed by drawing a sample of test statistics from the permutation null distribution and comparing them to the value of the test statistic obtained from the original data. Permutation statistics are generated with feature selection and kernel selection reapplied to each permuted copy of the data. By default, the calculation of the P-value includes a positive adjustment of 1/num_permutations to avoid P-values of 0, which are never possible for a permutation test using all possible permutations of the data (due to the identity permutation). Alternatively, p_value_adjustment = "floor" may be used to apply a floor of 1/num_permutations to the P-value in place of the adjustment, while p_value_adjustment = "none" will forego the adjustment and allow for P-values of 0.

Covariate adjustment is performed prior to testing by using ordinary least squares to fit a null model in which the covariate effects are modeled as linear effects. The residuals and standard errors from this model are used in place of the raw values and estimated variances for y during testing.

Value

When invoked with output_p_value_only = TRUE, amkat returns a double containing the P-value for the test; otherwise, it returns a list whose components vary depending on the arguments supplied to amkat. Components listed below are always present in the list unless otherwise specified:

`sample_size`	the row dimension of `x` and of `y`.
`y_dimension`	the column dimension of `y`.
`x_dimension`	the column dimension of `x`.
`number_of_covariates`	the column dimension of `covariates`, or `0` if no covariates were used.
`null_residuals`	a numeric matrix of the same dimensions as `y` containing the residuals from the null model (including covariates, if applicable). Not included when `output_null_residuals = FALSE`.
`null_standard_errors`	a numeric vector of length `ncol(y)` containing the standard error for each column of `null_residuals`. Not included when `output_null_residuals = FALSE`.
`filter_x`	a logical value indicating whether AMKAT's permutation-based filter method for feature selection was applied during testing.
`selected_x_columns`	if `num_test_statistics = 1`, a numeric vector containing the indices of the columns of `x` selected by AMKAT's filter method. Otherwise, a numeric matrix with `num_test_statistics` rows and `ncol(x)` columns, where the (i,j)th entry is `1` if the jth column of `x` was selected by AMKAT's filter when generating the ith test statistic and `0` otherwise. Not included if `filter_x = FALSE` or `output_selected_x_columns = FALSE`.
`candidate_kernels`	a character vector indicating the candidate kernels that were used during AMKAT's kernel selection process.
`selected_kernels`	if `num_test_statistics = 1`, a character vector of length `ncol(y)` containing the kernel function selected for each `y` variable by AMKAT's kernel selection method. If `num_test_statistics > 1`, a character matrix with `num_test_statistics` rows and `ncol(y)` columns, where the (i, j)th entry is `1` if the jth column of `x` was selected by AMKAT's filter when generating the ith test statistic. Not included if `filter_x = FALSE` or `output_selected_x_columns = FALSE`.
`test_statistic_type`	a description of the test statistic used. Only included when using the mean of multiple observed test statistic values, i.e., when `filter_x = TRUE` and `num_test_statistics > 1`.
`generated_test_statistics`	a numeric vector containing the values of the observed test statistics. Only included when `output_test_statistics = TRUE` and `num_test_statistics > 1`.
`test_statistic_value`	the value of the observed test statistic, or the mean value of the observed test statistics when `num_test_statistics > 1`. Only included when `output_test_statistics = TRUE`.
`number_of_permutations`	the number of permutation test statistics used in testing.
`permutation_statistics`	a numeric vector containing the values of the permutation test statistics. Only included when `output_test_statistics = TRUE`.
`p_value_adjustment`	a character string describing the method of adjustment used for the P-value.
`p_value`	the P-value for the test.

Author(s)

Brian Neal

References

Neal, Brian and He, Tao. “An adaptive multivariate kernel-based test for association with multiple quantitative traits in high-dimensional data.” Genetic Epidemiology (not yet submitted).

Examples

y <- matrix(rnorm(4 * 25), nrow = 25, ncol = 4)
x <- matrix(rnorm(200 * 25), nrow = 25, ncol = 200)
test_results <- amkat(y, x)

x <- matrix(rbinom(n = 200 * 25, size = 2, prob = 0.25),
            nrow = 25, ncol = 200)
w <- matrix(rnorm(2 * 25), nrow = 25, ncol = 2)
test_results <-
amkat(y, x, covariates = w,
      candidate_kernels = listAmkatKernelFunctions(),
      num_permutations = 500,
      num_test_statistics = 50,
      p_value_adjustment = "floor")

brianpatrickneal/AMKAT documentation built on June 15, 2022, 8:47 a.m.