testIAD_mpfr: Model-agnostic interaction difference test for prediction...

View source: R/InteractionTest.R

testIAD_mpfrR Documentation

Model-agnostic interaction difference test for prediction models

Description

Tests if a given set s of covariates contributes to interaction effects in the prediction model.

Usage

testIAD_mpfr(
  colInd,
  object,
  predictfun,
  X,
  newX = NULL,
  y = NULL,
  alternative = "twoSided",
  centering = FALSE,
  nCoresPDP = 1,
  precBits = 53 * 2
)

Arguments

colInd

Index of columns of covariates to specify the null hypothesis set s (integer vector).

object

Prediction model object (class flexible).

predictfun

Prediction function to be evaluated (class function). The prediction function needs to be specified with two arguments predictfun(object, X). The argument object is the prediction model and X the data on which the partial dependence functions are evaluated.

X

Data on that the partial dependence function is evaluated (class matrix or data.frame). The structure of the data depends on the specified argument predictfun.

newX

Test data set, which is used in the evaluation of the partial dependence functions (class "matrix" or "data.frame"). Default value NULL evaluates the interaction test on training data X.

y

Response (numeric vector). Default value NULL means that the statistic is not based on response information.

alternative

Should the hypothesis be tested two-sided alternative="twoSided", greater alternative="greater" or less alternative="less" (character vector)? Default is "twoSided".

centering

Should the resulting values be mean centered? (logical scalar). Default corresponds to output original values.

nCoresPDP

Number of cores used in standard parallel computation setup based on R parallel package to compute partial dependence function. The default value of one uses serial processing across observations.

precBits

Numerical precision that are used in computation after the calculation of the predictions from the estimated model. Default is defined to be double the amount of the 53 Bits usually used in R.

Details

The data set used to evaluate the interaction test coulbe be training or test data. The proof of the asymptotic distribution only works in case of test data. Therefore we recommend usage of test data for evaluation. Two types of hypothesis are recommended:

  • Model interactions: Model does not have interaction effects between covariates in set s and between other covariates and those of set s. Argument colInd specifies the set s and argument alternative should be set to "twoSided".

  • MSE improvement: Do interactions between between covariates in set s and between other covariates and those of set s descrease MSE? Argument alternative should be set to "less".

Value

List with following entries:

  • testStat: Test statistic based on one-sample Gaussian test.

  • pValue: P-value based on asymptotic normal distribution.

  • IAD_f_terms: If y=NULL then the terms used to calculate the variance of the predictions with possible interaction effects in set s are returned. If y!=NULL, then the terms of the MSE of the prediction model with interactions is returned.

  • IAD_f: If y=NULL then the variance of the predictions with possible interaction effects in set s is returned. If y!=NULL, then the MSE of the prediction model with possible interactions is returned.

  • IAD_PD_terms: If y=NULL then the terms used to calculate the variance of predictions under the null hypothesis of no interaction effects in set s are returned. If y!=NULL, then the terms used to calculate the MSE of the prediction model are returned. under the null hypothesis of no interaction effects are returned.

  • IAD_PD: If y=NULL then the variance of predictions under the null hypothesis of no interaction effects of set s is returned. If y!=NULL, then the MSE of the prediction model is returned.

  • z3 Terms used to calculate the the interaction difference. under the null hypothesis of no interaction effects is returned.

  • IAD: Interaction difference calculated as mean of the terms z3.

Note

Numerical precision does have influence on Type I error percentages. Under the null hypothesis values of the test statistic can be very small near zero. To lessen the impact of random rounding errors increased numerical precision is important.

Author(s)

Thomas Welchowski welchow@imbie.meb.uni-bonn.de

See Also

pdpEst_mpfr

Examples


#####################
# Simulation example
#####################

library(mvnfast)
library(mgcv)

############################################################################
# H0: Covariate X1 does not contribute to interaction effects in the 
# prediction model.

# Train data
# Simulate covariates from multivariate standard normal distribution
set.seed(-72498)
nSize <- 100
X <- mvnfast::rmvn(n=nSize, mu=rep(0, 2), sigma=diag(2))

# Response generation
y <- X[, 1]^2 + rnorm(n=nSize, mean=0, sd=0.25)

# Complete data frame
trainDat <- data.frame(X, y=y)

# Test data
# Simulate covariates from multivariate standard normal distribution
testX <- mvnfast::rmvn(n=nSize, mu=rep(0, 2), sigma=diag(2))

# Response generation
testY <- testX[, 1]^2 + rnorm(n=nSize, mean=0, sd=0.25)
testDat <- data.frame(testX, y=testY)

# Estimate generalized additive model with training data
library(mgcv)
gamFit <- gam(formula=y~s(X1)+s(X2), data=trainDat,
              family=gaussian())

# Test interaction
testIAD_mpfr1 <- testIAD_mpfr(colInd=1, object=gamFit,
                              predictfun=function(object, X){
                                predict(object=object, newdata=X, type="response")
                              }, X=testDat)
testIAD_mpfr1$pValue
# -> H0 is not rejected with alpha=0.05

###############################################################
# H1: X1 does contribute to at least one interaction effect
# in the prediction model.

# Train data
# Simulate covariates from multivariate standard normal distribution
set.seed(-72498)
nSize <- 150
X <- mvnfast::rmvn(n=nSize, mu=rep(0, 2), sigma=diag(2))

# Response generation
y <- X[, 1]^2 * X[, 2] + rnorm(n=nSize, mean=0, sd=0.25)

# Complete data frame
trainDat <- data.frame(X, y=y)

# Test data
# Simulate covariates from multivariate standard normal distribution
testX <- mvnfast::rmvn(n=nSize, mu=rep(0, 2), sigma=diag(2))

# Response generation
testY <- testX[, 1]^2 * testX[, 2] + rnorm(n=nSize, mean=0, sd=0.25)
testDat <- data.frame(testX, y=testY)

# Estimate generalized additive model with training data
library(mgcv)
gamFit <- gam(formula=y~s(X1, X2, k=25), data=trainDat,
              family=gaussian())

# Test interaction
testIAD_mpfr1 <- testIAD_mpfr(colInd=1, object=gamFit,
                              predictfun=function(object, X){
                                predict(object=object, newdata=X, type="response")
                              }, X=testDat)
testIAD_mpfr1$pValue
# -> H0 is rejected with alpha=0.05


IADT documentation built on May 29, 2024, 8:47 a.m.