amkat | R Documentation |
Tests nonparametrically for joint association between the columns of x
and those of y
while optionally adjusting for covariates, accommodating high-dimensional data for x
without distributional assumptions using a kernel-based test statistic. Includes supervised methods for feature selection and kernel selection.
amkat(y, x, covariates = NULL, filter_x = TRUE, candidate_kernels = c("lin", "quad", "gau", "exp"), num_permutations = 1000, p_value_adjustment = "pseudocount", num_test_statistics = 1, output_test_statistics = TRUE, output_selected_kernels = TRUE, output_selected_x_columns = TRUE, output_null_residuals = TRUE, output_p_value_only = FALSE)
y |
a numeric matrix containing data on the dependent variables, with observations indexed by row. |
x |
a numeric matrix with the same number of rows as |
covariates |
an optional numeric matrix with the same number of rows as |
filter_x |
logical, indicating whether to apply AMKAT's permutation-based filter method for feature selection. Requires |
candidate_kernels |
an optional character vector specifying the kernel functions to be considered during kernel selection. For a list of valid character strings and the kernel function corresponding to each, use |
num_permutations |
an optional strictly-positive integer specifying the number of test statistics from the permutation null distribution that are to be used in approximating the P-value for the test. |
p_value_adjustment |
an optional character string specifying the method of adjustment to apply to the permutation-based P-value for the test. Acceptable values are |
num_test_statistics |
an optional strictly-positive integer indicating the number of observed test statistic values to generate. If greater than 1, the mean value of the statistics is used as the observed value for testing; this stabilizes variation in the test statistic introduced by AMKAT's permutation-based filter method for feature selection. Has no effect if |
output_test_statistics |
logical, indicating whether output should include the values of all observed test statistics and permutation test statistics generated during testing. Has no effect if |
output_selected_kernels |
logical, indicating whether output should include the kernel function selected for each column of |
output_selected_x_columns |
logical, indicating whether output should include a record of which columns of |
output_null_residuals |
logical, indicating whether output should include the residuals and standard errors from the fitted null model (after covariate adjustment, if applicable). Has no effect if |
output_p_value_only |
logical; if |
A minimum requirement of 16 observations is enforced to avoid NaN
values when estimating the asymptotic variance of the test statistic.
The kernel-based AMKAT test statistic incorporates individual kernel selection for each variable in y
using a maximum statistic method. The kernel functions considered during kernel selection can be specified via the argument candidate_kernels
. The function listAmkatKernelFunctions()
retrieves the strings used to identify the available kernel functions, which include the Linear, Quadratic, Gaussian, Exponential and Identical-By-State (IBS) kernels. By default, all of these are considered except for the IBS kernel. ?listAmkatKernelFunctions
provides details on the individual kernel functions.
Prior to performing kernel selection and computing the test statistic, feature selection is performed using a filter method in which permuted data is compared to original data using tests of Spearman's Rho between columns of x
and y.
An option is included to repeat the feature selection and kernel selection process in order to generate multiple test statistic values and use their mean value for testing, in order to reduce variation in the test statistic introduced by the feature selection method.
The P-value for the test is computed by drawing a sample of test statistics from the permutation null distribution and comparing them to the value of the test statistic obtained from the original data. Permutation statistics are generated with feature selection and kernel selection reapplied to each permuted copy of the data. By default, the calculation of the P-value includes a positive adjustment of 1/num_permutations
to avoid P-values of 0, which are never possible for a permutation test using all possible permutations of the data (due to the identity permutation). Alternatively, p_value_adjustment = "floor"
may be used to apply a floor of 1/num_permutations
to the P-value in place of the adjustment, while p_value_adjustment = "none"
will forego the adjustment and allow for P-values of 0.
Covariate adjustment is performed prior to testing by using ordinary least squares to fit a null model in which the covariate effects are modeled as linear effects. The residuals and standard errors from this model are used in place of the raw values and estimated variances for y
during testing.
When invoked with output_p_value_only = TRUE
, amkat
returns a double
containing the P-value for the test; otherwise, it returns a list
whose components vary depending on the arguments supplied to amkat
. Components listed below are always present in the list unless otherwise specified:
sample_size |
the row dimension of |
y_dimension |
the column dimension of |
x_dimension |
the column dimension of |
number_of_covariates |
the column dimension of |
null_residuals |
a numeric matrix of the same dimensions as |
null_standard_errors |
a numeric vector of length |
filter_x |
a logical value indicating whether AMKAT's permutation-based filter method for feature selection was applied during testing. |
selected_x_columns |
if |
candidate_kernels |
a character vector indicating the candidate kernels that were used during AMKAT's kernel selection process. |
selected_kernels |
if |
test_statistic_type |
a description of the test statistic used. Only included when using the mean of multiple observed test statistic values, i.e., when |
generated_test_statistics |
a numeric vector containing the values of the observed test statistics. Only included when |
test_statistic_value |
the value of the observed test statistic, or the mean value of the observed test statistics when |
number_of_permutations |
the number of permutation test statistics used in testing. |
permutation_statistics |
a numeric vector containing the values of the permutation test statistics. Only included when |
p_value_adjustment |
a character string describing the method of adjustment used for the P-value. |
p_value |
the P-value for the test. |
Brian Neal
Neal, Brian and He, Tao. “An adaptive multivariate kernel-based test for association with multiple quantitative traits in high-dimensional data.” Genetic Epidemiology (not yet submitted).
y <- matrix(rnorm(4 * 25), nrow = 25, ncol = 4) x <- matrix(rnorm(200 * 25), nrow = 25, ncol = 200) test_results <- amkat(y, x) x <- matrix(rbinom(n = 200 * 25, size = 2, prob = 0.25), nrow = 25, ncol = 200) w <- matrix(rnorm(2 * 25), nrow = 25, ncol = 2) test_results <- amkat(y, x, covariates = w, candidate_kernels = listAmkatKernelFunctions(), num_permutations = 500, num_test_statistics = 50, p_value_adjustment = "floor")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.