Description Usage Arguments Details Value References Examples
The function can test goodness-of-fit of a low- or high-dimensional generalized linear model (GLM) by detecting the presence of nonlinearity in the conditional mean function of y given X. Outputs a p-value.
1 2 3 |
X |
A matrix or a data frame with |
y |
Response vector with |
fam |
Must be "gaussian", "binomial" or "poisson". |
RP_function |
(optional) User specified function for residual prediction (see Details below). |
nsplits |
Number of splits of the data set (see Details below). |
penalize |
|
output_all |
If |
This function tests if the conditional mean of y
given X
could be
originating from a GLM family specified by the user via fam
.
The function works by splitting the data into parts A and B,
and computes a GLM fit on both parts.
If penalize == TRUE
, these fits use cv.glmnet
from package glmnet
,
otherwise they use glmnet
with penalty set to 0. If
RP_function
(optional) is not supplied by
the user, randomForest
is used to predict remaining signal from the residuals from
GLM fit on part A. The test statistic is proportional to the dot product
between the random forest prediction and residuals from GLM fit on part B.
If nsplits
is greater than one, the above procedure is repeated nsplits
times and the resulting p-values are aggregated using the approach from Meinshausen at al. (2012)
A user may supply their own residual prediction function to replace random forest via
parameter RP_function
(see Examples for use). The function must take as arguments
an input matrix XA
, vector resA
(with length nrow(XA)
) and matrix XB
. Its
role is to regress resA
on input matrix XA
with a preferred residual
prediction method and output a vector with dimensions nrow(XB)
that contains predictions of this fit on input XB
.
If output_all = FALSE
, the function outputs a single p-value.
Otherwise it returns a list containing the aggregated p-value in pval
and
a vector of p-values from all splits in pvals
.
Janková, J., Shah, R. D., Bühlmann, P. and Samworth, R. (2019) Goodness-of-fit testing in high-dimensional generalized linear models https://arxiv.org/abs/1908.03606 Meinshausen, N., Meier, L. and Bühlmann, P. (2012) p-Values for High-Dimensional Regression Journal of the American Statistical Association, 104:488, 1671-1681
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | # Testing for nonlinearity: Logistic link function
set.seed(1)
X <- matrix(rnorm(300*30), 300, 30)
z <- X[, 1] + X[, 2]^4
pr <- 1/(1 + exp(-z))
y <- rbinom(nrow(X), 1, pr)
(out <- GRPtest(X, y, fam = "binomial", nsplits = 5))
# Testing for nonlinearity: Define your own RP function
# use package xyz
my_RP_function <- function(XA, resA, XB){
xyz_fit <- xyz_regression(XA, resA)
predict(xyz_fit, newdata = as.matrix(XB))[,5]
}
library(xyz)
set.seed(2)
X <- matrix(rnorm(500*30), 500, 30)
z <- X[,1:3]%*%rep(1,3) + 1*X[, 1]*X[,5]
mu <- exp(z)
y <- rpois(n = nrow(X), lambda = mu)
(out <- GRPtest(X, y, fam = "poisson", RP_function = my_RP_function))
## Not run:
# An example with factors (labelled as "Not run" due to running time > 10s)
set.seed(1)
n <- 2021
X1 <- sample(c("A","B","C"), n, replace = TRUE)
X1 <- factor(X1, levels = c("A", "B", "C"))
X2 <- sample(c("Male","Female"), n, replace = TRUE)
X2 <- factor(X2, levels = c("Male", "Female"))
X3 <- rnorm(n)
X <- data.frame(X1, X2, X3)
# Generate response y1 using a logistic regression model
prob1 <- 1 / (1 + exp( - (X1 == "B") + 2*(X1 == "C") - 2*(X2 == "Male") - X3) )
y1 <- rbinom(n, 1, prob1)
# Output p-value for goodness of fit of the logistic regression model
(out <- GRPtest(X, y1, fam = "binomial", nsplits = 10))
# Generate response y2 using a logistic regression model but with an interaction between X1 and X2
prob2 <- 1 / (1 + exp( - (X1 == "B") + 2*(X1 == "C") + 2*(X1 == "B")*(X2 == "Male") - 0.5*X3) )
y2 <- rbinom(n, 1, prob2)
# Test goodness of fit of the logistic regression model
(out <- GRPtest(X, y2, fam = "binomial", nsplits = 10))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.