amkat: Adaptive Multivariate Kernel-based Association Test

View source: R/main.R

amkatR Documentation

Adaptive Multivariate Kernel-based Association Test

Description

Tests nonparametrically for joint association between the columns of x and those of y while optionally adjusting for covariates, accommodating high-dimensional data for x without distributional assumptions using a kernel-based test statistic. Includes supervised methods for feature selection and kernel selection.

Usage

amkat(y, x, covariates = NULL, filter_x = TRUE,
      candidate_kernels = c("lin", "quad", "gau", "exp"),
      num_permutations = 1000,
      p_value_adjustment = "pseudocount",
      num_test_statistics = 1,
      output_test_statistics = TRUE,
      output_selected_kernels = TRUE,
      output_selected_x_columns = TRUE,
      output_null_residuals = TRUE,
      output_p_value_only = FALSE)

Arguments

y

a numeric matrix containing data on the dependent variables, with observations indexed by row.

x

a numeric matrix with the same number of rows as y containing data on the independent variables.

covariates

an optional numeric matrix with the same number of rows as y containing data on the covariates. The number of columns cannot exceed nrow(y) - 2.

filter_x

logical, indicating whether to apply AMKAT's permutation-based filter method for feature selection. Requires x to have at least two columns.

candidate_kernels

an optional character vector specifying the kernel functions to be considered during kernel selection. For a list of valid character strings and the kernel function corresponding to each, use listAmkatKernelFunctions().

num_permutations

an optional strictly-positive integer specifying the number of test statistics from the permutation null distribution that are to be used in approximating the P-value for the test.

p_value_adjustment

an optional character string specifying the method of adjustment to apply to the permutation-based P-value for the test. Acceptable values are "pseudocount", "floor" and "none".

num_test_statistics

an optional strictly-positive integer indicating the number of observed test statistic values to generate. If greater than 1, the mean value of the statistics is used as the observed value for testing; this stabilizes variation in the test statistic introduced by AMKAT's permutation-based filter method for feature selection. Has no effect if filter_x = FALSE.

output_test_statistics

logical, indicating whether output should include the values of all observed test statistics and permutation test statistics generated during testing. Has no effect if output_p_value_only = TRUE.

output_selected_kernels

logical, indicating whether output should include the kernel function selected for each column of y. Has no effect if output_p_value_only = TRUE.

output_selected_x_columns

logical, indicating whether output should include a record of which columns of x were selected for use in testing by AMKAT's filter method. Has no effect if filter_x = FALSE or if output_p_value_only = TRUE.

output_null_residuals

logical, indicating whether output should include the residuals and standard errors from the fitted null model (after covariate adjustment, if applicable). Has no effect if output_p_value_only = TRUE.

output_p_value_only

logical; if TRUE, the function returns only the P-value for the test rather than a list of results.

Details

A minimum requirement of 16 observations is enforced to avoid NaN values when estimating the asymptotic variance of the test statistic.

The kernel-based AMKAT test statistic incorporates individual kernel selection for each variable in y using a maximum statistic method. The kernel functions considered during kernel selection can be specified via the argument candidate_kernels. The function listAmkatKernelFunctions() retrieves the strings used to identify the available kernel functions, which include the Linear, Quadratic, Gaussian, Exponential and Identical-By-State (IBS) kernels. By default, all of these are considered except for the IBS kernel. ?listAmkatKernelFunctions provides details on the individual kernel functions.

Prior to performing kernel selection and computing the test statistic, feature selection is performed using a filter method in which permuted data is compared to original data using tests of Spearman's Rho between columns of x and y.

An option is included to repeat the feature selection and kernel selection process in order to generate multiple test statistic values and use their mean value for testing, in order to reduce variation in the test statistic introduced by the feature selection method.

The P-value for the test is computed by drawing a sample of test statistics from the permutation null distribution and comparing them to the value of the test statistic obtained from the original data. Permutation statistics are generated with feature selection and kernel selection reapplied to each permuted copy of the data. By default, the calculation of the P-value includes a positive adjustment of 1/num_permutations to avoid P-values of 0, which are never possible for a permutation test using all possible permutations of the data (due to the identity permutation). Alternatively, p_value_adjustment = "floor" may be used to apply a floor of 1/num_permutations to the P-value in place of the adjustment, while p_value_adjustment = "none" will forego the adjustment and allow for P-values of 0.

Covariate adjustment is performed prior to testing by using ordinary least squares to fit a null model in which the covariate effects are modeled as linear effects. The residuals and standard errors from this model are used in place of the raw values and estimated variances for y during testing.

Value

When invoked with output_p_value_only = TRUE, amkat returns a double containing the P-value for the test; otherwise, it returns a list whose components vary depending on the arguments supplied to amkat. Components listed below are always present in the list unless otherwise specified:

sample_size

the row dimension of x and of y.

y_dimension

the column dimension of y.

x_dimension

the column dimension of x.

number_of_covariates

the column dimension of covariates, or 0 if no covariates were used.

null_residuals

a numeric matrix of the same dimensions as y containing the residuals from the null model (including covariates, if applicable). Not included when output_null_residuals = FALSE.

null_standard_errors

a numeric vector of length ncol(y) containing the standard error for each column of null_residuals. Not included when output_null_residuals = FALSE.

filter_x

a logical value indicating whether AMKAT's permutation-based filter method for feature selection was applied during testing.

selected_x_columns

if num_test_statistics = 1, a numeric vector containing the indices of the columns of x selected by AMKAT's filter method. Otherwise, a numeric matrix with num_test_statistics rows and ncol(x) columns, where the (i,j)th entry is 1 if the jth column of x was selected by AMKAT's filter when generating the ith test statistic and 0 otherwise. Not included if filter_x = FALSE or output_selected_x_columns = FALSE.

candidate_kernels

a character vector indicating the candidate kernels that were used during AMKAT's kernel selection process.

selected_kernels

if num_test_statistics = 1, a character vector of length ncol(y) containing the kernel function selected for each y variable by AMKAT's kernel selection method. If num_test_statistics > 1, a character matrix with num_test_statistics rows and ncol(y) columns, where the (i, j)th entry is 1 if the jth column of x was selected by AMKAT's filter when generating the ith test statistic. Not included if filter_x = FALSE or output_selected_x_columns = FALSE.

test_statistic_type

a description of the test statistic used. Only included when using the mean of multiple observed test statistic values, i.e., when filter_x = TRUE and num_test_statistics > 1.

generated_test_statistics

a numeric vector containing the values of the observed test statistics. Only included when output_test_statistics = TRUE and num_test_statistics > 1.

test_statistic_value

the value of the observed test statistic, or the mean value of the observed test statistics when num_test_statistics > 1. Only included when output_test_statistics = TRUE.

number_of_permutations

the number of permutation test statistics used in testing.

permutation_statistics

a numeric vector containing the values of the permutation test statistics. Only included when output_test_statistics = TRUE.

p_value_adjustment

a character string describing the method of adjustment used for the P-value.

p_value

the P-value for the test.

Author(s)

Brian Neal

References

Neal, Brian and He, Tao. “An adaptive multivariate kernel-based test for association with multiple quantitative traits in high-dimensional data.” Genetic Epidemiology (not yet submitted).

Examples

y <- matrix(rnorm(4 * 25), nrow = 25, ncol = 4)
x <- matrix(rnorm(200 * 25), nrow = 25, ncol = 200)
test_results <- amkat(y, x)

x <- matrix(rbinom(n = 200 * 25, size = 2, prob = 0.25),
            nrow = 25, ncol = 200)
w <- matrix(rnorm(2 * 25), nrow = 25, ncol = 2)
test_results <-
amkat(y, x, covariates = w,
      candidate_kernels = listAmkatKernelFunctions(),
      num_permutations = 500,
      num_test_statistics = 50,
      p_value_adjustment = "floor")

brianpatrickneal/AMKAT documentation built on June 15, 2022, 8:47 a.m.