mdmr: Conduct MDMR with analytic p-values
In dmcartor/mdmr: Multivariate Distance Matrix Regression

Description Usage Arguments Details Value Author(s) References Examples

mdmr (multivariate distance matrix regression) is used to regress a distance matrix onto a set of predictors. It returns the test statistic, pseudo R-square statistic, and analytic p-values for all predictors jointly and for each predictor individually, conditioned on the rest.

1
2
3

mdmr(X, D = NULL, G = NULL, lambda = NULL, return.lambda = F,
  start.acc = 1e-20, ncores = 1, perm.p = (nrow(as.matrix(X)) < 200),
  nperm = 500, seed = NULL)

`X`	A n x p matrix or data frame of predictors. Unordered factors will be tested with contrast-codes by default, and ordered factors will be tested with polynomial contrasts. For finer control of how categorical predictors are handled, or if higher-order effects are desired, the output from a call to `model.matrix()` can be supplied to this argument as well.
`D`	Distance matrix computed on the outcome data. Can be either a matrix or an R `dist` object. Either `D` or `G` must be passed to `mdmr()`.
`G`	Gower's centered similarity matrix computed from `D`. Either `D` or `G` must be passed to `mdmr`.
`lambda`	Optional argument: Eigenvalues of `G`. Eigendecomposition of large `G` matrices can be somewhat time consuming, and the theoretical p-values require the eigenvalues of `G`. If MDMR is to be conducted multiple times on one distance matrix, it is advised to conduct the eigendecomposition once and pass the eigenvalues to `mdmr()` directly each time.
`return.lambda`	Logical; indicates whether or not the eigenvalues of `G` should be returned, if calculated. Default is `FALSE`.
`start.acc`	Starting accuracy of the Davies (1980) algorithm implemented in the `davies` function in the `CompQuadForm` package (Duchesne & De Micheaux, 2010) that `mdmr()` uses to compute MDMR p-values.
`ncores`	Integer; if `ncores` > 1, the `parallel` package is used to speed computation. Note: Windows users must set `ncores = 1` because the `parallel` pacakge relies on forking. See `mc.cores` in the `mclapply` function in the `parallel` pacakge for more details.
`perm.p`	Logical: should permutation-based p-values be computed instead of analytic p-values? Default behavior is `TRUE` if `n < 200` and `FALSE` otherwise because the anlytic p-values depend on asymptotics. for `n > 200` and "permutation" otherwise.
`nperm`	Number of permutations to use if permutation-based p-values are to be computed.
`seed`	Random seed to use to generate the permutation null distribution. Defaults to a random seed.

This function is the fastest approach to conducting MDMR. It uses the fastest known computational strategy to compute the MDMR test statistic (see Appendix A of McArtor et al., 2017), and it uses fast, analytic p-values.

The slowest part of conducting MDMR is now the necessary eigendecomposition of the G matrix, whose computation time is a function of n^3. If MDMR is to be conducted multiple times on the same distance matrix, it is recommended to compute eigenvalues of G in advance and pass them to the function rather than computing them every time mdmr is called, as is the case if the argument lambda is left NULL.

The distance matrix D can be passed to mdmr as either a distance object or a symmetric matrix.

An object with six elements and a summary function. Calling summary(mdmr.res) produces a data frame comprised of:

`Statistic`	Value of the corresponding MDMR test statistic
`Numer DF`	Numerator degrees of freedom for the corresponding effect
`Pseudo R2`	Size of the corresponding effect on the distance matrix
`p-value`	The p-value for each effect.

In addition to the information in the three columns comprising summary(res), the res object also contains:

`p.prec`	A data.frame reporting the precision of each p-value. If analytic p-values were computed, these are the maximum error bound of the p-values reported by the `davies` function in `CompQuadForm`. If permutation p-values were computed, it is the standard error of each permutation p-value.
`lambda`	A vector of the eigenvalues of `G` (if `return.lambda = T`).
`nperm`	Number of permutations used. Will read `NA` if analytic p-values were computed

Note that the printed output of summary(res) will truncate p-values to the smallest trustworthy values, but the object returned by summary(res) will contain the p-values as computed. The reason for this truncation differs for analytic and permutation p-values. For an analytic p-value, if the error bound of the Davies algorithm is larger than the p-value, the only conclusion that can be drawn with certainty is that the p-value is smaller than (or equal to) the error bound. For a permutation test, the estimated p-value will be zero if no permuted test statistics are greater than the observed statistic, but the zero p-value is only a product of the finite number of permutations conduted. The only conclusion that can be drawn is that the p-value is smaller than 1/nperm.

Daniel B. McArtor (dmcartor@gmail.com) [aut, cre]

Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.

Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.

McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.

# --- The following two approaches yield equivalent results --- #
# Approach 1
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
res1 <- mdmr(X = X.mdmr, D = D)
summary(res1)

# Approach 2
data(mdmrdata)
D <- dist(Y.mdmr, method = "euclidean")
G <- gower(D)
res2 <- mdmr(X = X.mdmr, G = G)
summary(res2)