| Petrie | R Documentation |
Performs the multisample crossmatch (MCM) test (Petrie, 2016).
Petrie(X1, X2, ..., dist.fun = stats::dist, dist.args = NULL, seed = NULL,
shuffle = FALSE)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
... |
Optionally more datasets as matrices or data.frames |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
dist.args |
Named list of further arguments passed to |
seed |
Random seed (default: NULL). A random seed will only be set if one is provided. |
shuffle |
Logical indicator specifying whether the pooled sample should be randomly permuted before matching in the presence of ties (default: FALSE). |
The test is an extension of the Rosenbaum (2005) crossmatch test to multiple samples that uses the crossmatch count of all pairs of samples.
The observed cross-counts are calculated using the functions distancematrix and nonbimatch from the nbpMatching package.
High values of the multisample crossmatch statistic indicate similarity between the datasets. Thus, the test rejects the null hypothesis of equal distributions for low values of the test statistic.
Note that due to the specific implementation of the non-bipartite matching algorithm, inflated cross-counts in case of high numbers of ties in the distance matrix were observed in simulations under the null scenario, see note below. Therefore, the option to shuffle the dataset before the matching algorithm was introduce to circumvent this inflated cross-counts. A warning is thrown in case any ties in the distance matrix are detected.
An object of class htest with the following components:
statistic |
Observed value of the test statistic |
p.value |
Asymptotic p value |
estimate |
Observed multisample edge-count |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
stderr |
Standard deviation under the null |
mu0 |
Expectation under the null |
| Target variable? | Numeric? | Categorical? | K-sample? |
| No | Yes | Yes | Yes |
In case of ties in the distance matrix, the optimal non-bipartite matching might not be defined uniquely.
Here, the observations are matched in the order in which the samples are supplied.
When searching for a match, the implementation starts at the end of the pooled sample.
Therefore, with many ties (e.g. for categorical data), observations from the first dataset are often matched with ones from the last dataset and so on.
This might affect the validity of the test negatively. For this reason, a warning is issued whenever ties are detected.
As a workaround, users can set shuffle = TRUE in the function call, which randomly permutes the pooled sample before matching.
When using this option, it is strongly recommended to set a random seed to ensure reproducibility.
Because this method cannot handle missing data, any missing values are removed automatically and a warning is issued.
Mukherjee, S., Agarwal, D., Zhang, N. R. and Bhattacharya, B. B. (2022). Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics, Journal of the American Statistical Association, 117(538), 627-638, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2020.1791131")}
Rosenbaum, P. R. (2005). An Exact Distribution-Free Test Comparing Two Multivariate Distributions Based on Adjacency. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(4), 515-530.
Petrie, A. (2016). Graph-theoretic multisample tests of equality in distribution for high dimensional data. Computational Statistics & Data Analysis, 96, 145-158, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2015.11.003")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
Stolte, M., Rahnenführer, J., Bommert, A. (2026). An Empirical Comparison of Methods for Quantifying the Similarity of Numeric Datasets. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2604.12327")}
Stolte, M., Rahnenführer, J., Bommert, A. (2026). An Empirical Comparison of Methods for Quantifying the Similarity of Categorical Datasets. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2604.11458")}
MMCM, Rosenbaum
set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform MCM test
if(requireNamespace("nbpMatching", quietly = TRUE)) {
Petrie(X1, X2)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.