| RISE | R Documentation |
Performs the Rank In Similarity Graph Edge-count two-sample test (RISE) for multivariate data (Zhou and Chen, 2023). The implementation here uses the RISE implementation from the GraphRankTest package.
RISE(X1, X2, sim.fun = function(x, ...) -as.matrix(stats::dist(x, ...)), K = 10,
rank.type = "RgNN", n.perm = 0, dist.args = NULL, seed = NULL)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
sim.fun |
Function for calculating a similarity matrix on the pooled dataset (default: negative value of |
K |
Parameter |
rank.type |
Character specifying the similarity graph ( |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
dist.args |
Named list of further arguments passed to |
seed |
Random seed (default: NULL). A random seed will only be set if one is provided. |
Zhou and Chen, 2023 define the following two graph-based rank matrices R = (R_{ij})_{i,j=1}^N using sequences of similarity graphs G_l based on the distance matrix S of the two datasets.
The graph-induced ranks are defined as
R_{ij} = \sum_{l=1}^K \boldsymbol{1}\left(\left(i,j\right)\in G_l\right).
They can be interpreted as the number of graphs that contain the edge (i,j) in the sequence of graphs.
The overall ranks are defined as
R_{ij} = \text{rank}\left(S\left(Z_i, Z_j\right), G_K\right),
where \text{rank}\left(S\left(Z_i, Z_j\right), G_K\right) denotes the rank of S\left(Z_i, Z_j\right) among the values \{S\left(Z_u, Z_v\right)\}_{(u,v)\in G_K} if (i,j)\in G_k and zero otherwise.
The overall rank can be interpreted as the rank of the similarity of edges in the graph G_K.
Both rank definitions depend on the choice of the parameter K that defines the length of the graph sequence.
For the test, the symmetrized rank matrix 1/2(R+R^T) is used, which is also denoted by $R$ for convenience.
For the test statistic, the within-sample rank sums of the first and second sample are defined as
U_x = \sum_{i,j=1}^{n_1} R_{ij}, U_y = \sum_{i,j=n_1 + 1}^{N} R_{ij}.
Using these, the rank in similarity graph edge-count two-sample test (RISE) statistic is defined as
T_R = (U_{X1} - \mu_{X1}, U_{X2} - \mu_{X2})\Sigma^{-1}(U_{X1} - \mu_{X1}, U_{X2} - \mu_{X2})^T,
where \mu_{X1} = \mathbb{E}(U_{X1}), \mu_{X2} = \mathbb{E}(U_{X2}), and \Sigma = \mathbb{C}\text{ov}((U_{X1}, U_{X2})^T) can be calculated explicitly under the permutation null hypothesis.
For small samples, the exact permutation null distribution can be used for testing.
For large samples and under several assumptions on the similarity graphs, the asymptotic \chi^2_2-distribution of T_R can be used for testing.
High values of the test statistic indicate dissimilarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for large values.
For n.perm = 0, an asymptotic test using the asymptotic \chi^2_2 approximation of the null distribution is performed. For n.perm > 0, a permutation test is performed.
This implementation is a wrapper function around the function RISE that modifies the in- and output of that function to match the other functions provided in this package. For more details see the RISE.
An object of class htest with the following components:
statistic |
Observed value of the test statistic |
parameter |
Degrees of freedom for asymptotic test |
p.value |
Asymptotic or permutation p value |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
| Target variable? | Numeric? | Categorical? | K-sample? |
| No | Yes | No | No |
Because this method cannot handle missing data, any missing values are removed automatically and a warning is issued.
Zhou, D. and Chen, H. (2023). A new ranking scheme for modern data and its application to two-sample hypothesis testing. In Proceedings of the 36th Annual Conference on Learning Theory (COLT 2023), PMLR, pp. 3615–3668.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform RISE
if(requireNamespace("GraphRankTest", quietly = TRUE)) {
# Using 10-NNG and graph-induced ranks
RISE(X1, X2)
# Using 10-NNG and overall ranks
RISE(X1, X2, rank.type = "RoNN")
# Using 5-MST and graph-induced ranks
RISE(X1, X2, K = 5, rank.type = "RgMST")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.