SC | R Documentation |
Performs the graph-based multi-sample test for high-dimensional data proposed by Song and Chen (2022). The implementation here uses the gtestsmulti
implementation from the gTestsMulti package.
SC(X1, X2, ..., n.perm = 0, dist.fun = stats::dist, graph.fun = MST,
dist.args = NULL, graph.args = NULL, type = "S", seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
... |
Optionally more datasets as matrices or data.frames |
n.perm |
Number of permutations for permutation test (default: 0, no permutation test performed) |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
graph.fun |
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: |
dist.args |
Named list of further arguments passed to |
graph.args |
Named list of further arguments passed to |
type |
Character specifying the test statistic to use. Possible options are |
seed |
Random seed (default: 42) |
Two multi-sample test statistics are defined by Song and Chen (2022) based on a similarity graph. The first one is defined as
S = S_W + S_B, \text{ where}
S_W = (R_W - \text{E}(R_W))^T \Sigma_W^{-1}(R_W - \text{E}(R_W)),
S_B = (R_B - \text{E}(R_B))^T \Sigma_W^{-1}(R_B - \text{E}(R_B)),
with R_W
denoting the vector of within-sample edge counts and R_B
the vector of between-sample edge counts. Expectations and covariance matrix are calculated under the null.
The second statistic is defined as
S_A = (R_A - \text{E}(R_A))^T \Sigma_W^{-1}(R_A - \text{E}(R_A)),
where R_A
is the vector of all linearly independent edge counts, i.e. the edge counts for all pairs of samples except the last pair k-1
and k
.
This implementation is a wrapper function around the function gtestsmulti
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the gtestsmulti
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Permutation p value (only if |
estimate |
Estimated KMD value |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | Yes |
Song, H. and Chen, H. (2022). New graph-based multi-sample tests for high-dimensional and non- Euclidean data. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2205.13787")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
gTestsMulti
for performing both tests at once, MST
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Song and Chen test
if(requireNamespace("gTestsMulti", quietly = TRUE)) {
SC(X1, X2, n.perm = 100)
SC(X1, X2, n.perm = 100, type = "SA")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.