mram: Estimate the Multivariate Regression Association Measure

View source: R/mram.R

mramR Documentation

Estimate the Multivariate Regression Association Measure

Description

Compute T_n and its standard error estimates using the nearest neighbor method and the m-out-of-n bootstrap.

Usage

mram(
  y_data,
  x_data,
  z_data = NULL,
  bootstrap = FALSE,
  B = 1000,
  g_vec = seq(0.4, 0.9, by = 0.05)
)

Arguments

y_data

A n \times d matrix of responses, where n is the sample size.

x_data

A n \times p matrix of predictors.

z_data

A n \times q matrix of conditional predictors. The default value is NULL.

bootstrap

Perform the m-out-of-n bootstrap if TRUE. The default value is FALSE.

B

Number of bootstrap replications. The default value is 1000.

g_vec

A vector of candidate values for \gamma between 0 and 1, used to generate a collection of rules for the m-out-of-n bootstrap. The default value is seq(0.4,0.9,by = 0.05).

Details

Let \{({\bf X}_i,{\bf Y}_i,{\bf Z}_i)\}_{i = 1}^n be independent and identically distributed data from the population ({\bf X},{\bf Y},{\bf Z}). The estimate T_n({\bf X},{\bf Y}) for the unconditional measure (z_data = NULL) is given as

T_n({\bf X},{\bf Y}) = \binom{n}{2}^{-1} \sum_{i < j} \langle S({{\bf Y}_i - {\bf Y}_j}), S({{\bf Y}_{N(i)} - {\bf Y}_{N(j)}}) \rangle,

where \langle \cdot, \cdot \rangle is the dot product, S(\cdot) is the spatial sign function, and N(i) is the index j such that {\bf X}_j is the nearest neighbor of {\bf X}_i according to the Euclidean distance. The estimate T_n({\bf X},{\bf Y} \mid {\bf Z}) for the conditional measure is given as

T_n({\bf X},{\bf Y} \mid {\bf Z} ) = \frac{T_n(({\bf X},{\bf Z}),{\bf Y} ) - T_n({\bf Z},{\bf Y} )}{1 - T_n({\bf Z},{\bf Y} )}.

See the paper Shih and Chen (2025, in revision) for more details.

For the m-out-of-n bootstrap, the rule (resample size) is set to be m = \lfloor n^\gamma \rfloor, where \lfloor x \rfloor denotes the largest integer that is smaller than or equal to x and 0 < \gamma < 1 takes values from the vector g_vec. It is recommended to use T_se_cluster, the standard error estimate obtained based on the cluster rule. See Dette and Kroll (2024) for more details.

The mram function is used in vs_mram function for variable selection.

Value

T_est

The estimate of the multivariate regression association measure. The value returned by T_est is between -1 and 1. However, it is between 0 and 1 asymptotically. A small value indicates that x_data has low predictability for y_data condition on z_data in the sense of the considered measure. On the other hand, a large value indicates that x_data has high predictability for y_data condition on z_data. If z_data = NULL, the returned value indicates the unconditional predictability.

T_se_cluster

The standard error estimate based on the cluster rule.

m_vec

The vector of m generated by g_vec.

T_se_vec

The vector of standard error estimates obtained from the m-out-of-n bootstrap, where m is equal to m_vec.

J_cluster

The index of the best m_vec chosen by the cluster rule.

References

Dette and Kroll (2024) A Simple Bootstrap for Chatterjee’s Rank Correlation, Biometrika, asae045.

Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.

See Also

vs_mram

Examples

library(MRAM)

n = 100

set.seed(1)
x_data = matrix(rnorm(n*2),n,2)
y_data = matrix(0,n,2)
y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]+rnorm(n)
y_data[,2] = x_data[,1]*x_data[,2]-x_data[,1]+rnorm(n)

mram(y_data,x_data[,1],x_data[,2])
mram(y_data,x_data[,2],x_data[,1])
mram(y_data,x_data[,1])
mram(y_data,x_data[,2])

## Not run: 

# perform the m-out-of-n bootstrap
mram(y_data,x_data[,1],x_data[,2],bootstrap = TRUE)
mram(y_data,x_data[,2],x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,2],bootstrap = TRUE)

## End(Not run)

MRAM documentation built on Jan. 8, 2026, 1:08 a.m.

Related to mram in MRAM...