discr.test.two_sample: Discriminability Two Sample Permutation Test
In neurodata/r-mgc: Multiscale Graph Correlation

Description Usage Arguments Value Details Author(s) References Examples

A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.

discr.test.two_sample(
  X1,
  X2,
  Y,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidian"),
  dist.return = NULL,
  remove.isolates = TRUE,
  nperm = 500,
  no_cores = 1,
  alt = "greater"
)

`X1`	is interpreted as a `[n x d]` data matrix with `n` samples in `d` dimensions. Should NOT be a distance matrix.
`X2`	is interpreted as a `[n x d]` data matrix with `n` samples in `d` dimensions. Should NOT be a distance matrix.
`Y`	`[n]` a vector containing the sample ids for our `n` samples. Should be matched such that `Y[i]` is the corresponding label for `X1[i,]` and `X2[i,]`.
`dist.xfm`	if `is.dist == FALSE`, a distance function to transform `X`. If a distance function is passed, it should accept an `[n x d]` matrix of `n` samples in `d` dimensions and return a `[n x n]` distance matrix as the `$D` return argument. See mgc.distance for details.
`dist.params`	a list of trailing arguments to pass to the distance function specified in `dist.xfm`. Defaults to `list(method='euclidean')`.
`dist.return`	the return argument for the specified `dist.xfm` containing the distance matrix. Defaults to `FALSE`. `is.null(dist.return)` use the return argument directly from `dist.xfm` as the distance matrix. Should be a `[n x n]` matrix. `is.character(dist.return) \| is.integer(dist.return)` use `dist.xfm[[dist.return]]` as the distance matrix. Should be a `[n x n]` matrix.
`remove.isolates`	remove isolated samples from the dataset. Isolated samples are samples with only one instance of their class appearing in the `Y` vector. Defaults to `TRUE`.
`nperm`	the number of permutations for permutation test. Defualts to `500`.
`no_cores`	the number of cores to use for the permutations. Defaults to `1`.
`alt`	the alternative hypothesis. Can be that first dataset is more discriminable (`alt = 'greater'`), less discriminable (`alt = 'less'`), or just non-equal (`alt = 'neq'`). Defaults to `"greater"`.

A list containing the following:

`stat`	the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2.
`discr`	the discriminabilities for each of the two data sets, as a list.
`null`	the null distribution of the test statistic, computed via permutation.
`p.value`	The p-value associated with the test.
`alt`	The alternative hypothesis for the test.

A function that performs a two-sample test for whether the discriminability is different for that of one dataset vs another, as described in Bridgeford et al. (2019). With Dhatx1 the sample discriminability of one approach, and Dhatx2 the sample discriminability of another approach:

H0: Dx1 = Dx2

and:

Ha: Dx1 > Dx2

. Also implemented are tests of < and !=.

Eric Bridgeford

Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).

## Not run: 
require(mgc)
require(MASS)

n = 100; d=5

# generate two subjects truths; true difference btwn
# subject 1 (column 1) and subject 2 (column 2)
mus <- cbind(c(0, 0), c(1, 1))
Sigma <- diag(2)  # dimensions are independent

# first dataset X1 contains less noise than X2
X1 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)}))
X2 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)}))
Y <- do.call(c, lapply(1:2, function(i) rep(i, 50)))

# X1 should be more discriminable, as less noise
discr.test.two_sample(X1, X2, Y, alt="greater")$p.value  # p-value is small

## End(Not run)