mram: Estimate the Multivariate Regression Association Measure
In MRAM: Multivariate Regression Association Measure

mram	R Documentation

Estimate the Multivariate Regression Association Measure

Description

Compute T_n and its standard error estimates using the nearest neighbor method and the m-out-of-n bootstrap.

Usage

mram(
  y_data,
  x_data,
  z_data = NULL,
  bootstrap = FALSE,
  B = 1000,
  g_vec = seq(0.4, 0.9, by = 0.05)
)

Arguments

`y_data`	A `n \times d` matrix of responses, where `n` is the sample size.
`x_data`	A `n \times p` matrix of predictors.
`z_data`	A `n \times q` matrix of conditional predictors. The default value is `NULL`.
`bootstrap`	Perform the `m`-out-of-`n` bootstrap if `TRUE`. The default value is `FALSE`.
`B`	Number of bootstrap replications. The default value is `1000`.
`g_vec`	A vector of candidate values for `\gamma` between 0 and 1, used to generate a collection of rules for the `m`-out-of-`n` bootstrap. The default value is `seq(0.4,0.9,by = 0.05)`.

Details

Let \{({\bf X}_i,{\bf Y}_i,{\bf Z}_i)\}_{i = 1}^n be independent and identically distributed data from the population ({\bf X},{\bf Y},{\bf Z}). The estimate T_n({\bf X},{\bf Y}) for the unconditional measure (z_data = NULL) is given as

T_n({\bf X},{\bf Y}) = \binom{n}{2}^{-1} \sum_{i < j} \langle S({{\bf Y}_i - {\bf Y}_j}), S({{\bf Y}_{N(i)} - {\bf Y}_{N(j)}}) \rangle,

where \langle \cdot, \cdot \rangle is the dot product, S(\cdot) is the spatial sign function, and N(i) is the index j such that {\bf X}_j is the nearest neighbor of {\bf X}_i according to the Euclidean distance. The estimate T_n({\bf X},{\bf Y} \mid {\bf Z}) for the conditional measure is given as

T_n({\bf X},{\bf Y} \mid {\bf Z} ) = \frac{T_n(({\bf X},{\bf Z}),{\bf Y} ) - T_n({\bf Z},{\bf Y} )}{1 - T_n({\bf Z},{\bf Y} )}.

See the paper Shih and Chen (2025, in revision) for more details.

For the m-out-of-n bootstrap, the rule (resample size) is set to be m = \lfloor n^\gamma \rfloor, where \lfloor x \rfloor denotes the largest integer that is smaller than or equal to x and 0 < \gamma < 1 takes values from the vector g_vec. It is recommended to use T_se_cluster, the standard error estimate obtained based on the cluster rule. See Dette and Kroll (2024) for more details.

The mram function is used in vs_mram function for variable selection.

Value

`T_est`	The estimate of the multivariate regression association measure. The value returned by `T_est` is between `-1` and `1`. However, it is between `0` and `1` asymptotically. A small value indicates that `x_data` has low predictability for `y_data` condition on `z_data` in the sense of the considered measure. On the other hand, a large value indicates that `x_data` has high predictability for `y_data` condition on `z_data`. If `z_data = NULL`, the returned value indicates the unconditional predictability.
`T_se_cluster`	The standard error estimate based on the cluster rule.
`m_vec`	The vector of `m` generated by `g_vec`.
`T_se_vec`	The vector of standard error estimates obtained from the `m`-out-of-`n` bootstrap, where `m` is equal to `m_vec`.
`J_cluster`	The index of the best `m_vec` chosen by the cluster rule.

References

Dette and Kroll (2024) A Simple Bootstrap for Chatterjee’s Rank Correlation, Biometrika, asae045.

Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.

Examples

library(MRAM)

n = 100

set.seed(1)
x_data = matrix(rnorm(n*2),n,2)
y_data = matrix(0,n,2)
y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]+rnorm(n)
y_data[,2] = x_data[,1]*x_data[,2]-x_data[,1]+rnorm(n)

mram(y_data,x_data[,1],x_data[,2])
mram(y_data,x_data[,2],x_data[,1])
mram(y_data,x_data[,1])
mram(y_data,x_data[,2])

## Not run: 

# perform the m-out-of-n bootstrap
mram(y_data,x_data[,1],x_data[,2],bootstrap = TRUE)
mram(y_data,x_data[,2],x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,2],bootstrap = TRUE)

## End(Not run)

MRAM documentation built on Jan. 8, 2026, 1:08 a.m.