goodnessoffit: Goodness of fit testing in regression models using Global...

Description Usage Arguments Details Value Methods Note Author(s) References See Also Examples

Description

Tests the goodness of fit of a regression model against a specified alternative using the Global Test. Three main functions are provided: gtPS uses Penalized Splines, gtKS uses Kernel Smoothers and gtLI uses Linear Interactions. The other functions are for external use in combination with gt.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
gtPS(response, null, data, 
      model = c("linear", "logistic", "cox", "poisson", "multinomial"),
      ..., covs, bdeg = 3, nint= 10, pord = 2, interact = FALSE, robust = FALSE, 
      termlabels = FALSE, returnZ = FALSE)
  
gtKS(response, null, data, 
      model = c("linear", "logistic", "cox", "poisson", "multinomial"),
      ..., covs, quant = .25, metric = c("euclidean", "pearson"), 
      kernel=c("uniform", "exponential", "triangular", "neighbours", "gauss"),
      robust = FALSE, scale = TRUE, termlabels = FALSE, returnZ = FALSE)
	
gtLI(response, null, data, ..., covs, iorder=2, termlabels = FALSE, standardize = FALSE)

bbase(x, bdeg, nint)

btensor(xs, bdeg, nint, pord, returnU=FALSE)

reparamZ(Z, pord, K=NULL, tol = 1e-10, returnU=FALSE)

reweighZ(Z, null.fit)

sterms(object, ...)

Arguments

response

The response vector of the regression model. May be supplied as a vector, as a formula object, or as an object of class lm, glm or coxph. In the last two cases, the specification of null is not required.

null

The null design matrix. May be given as a matrix or as a half formula object (e.g. ~a+b).

data

Only used when response or null is given in formula form. An optional data frame, list or environment containing the variables used in the formulae.

model

The type of regression model to be tested. If omitted, the function will try to determine the model from the class and values of the response argument.

...

Any other arguments are also passed on to gt.

covs

A variable or a vector of variables that are the covariates the smooth terms are function of.

bdeg

A vector or a list of vectors which specifies the degree of the B-spline basis, with default bdeg=3.

nint

A vector or a list of vectors which specifies the number of intervals determined by equally-spaced knots, with default nint=10.

pord

A vector or a list of vectors which specifies the order of the differences indicating the type of the penalty imposed to the coefficients, with default pord=2.

interact

TRUE to consider a multidimensional smooth function of covs.

termlabels

TRUE to consider e.g. s(log(cov)) instead of s(cov) when null=~ log(cov) and covs is missing.

robust

TRUE to obtain an overall test which combines multiple specifications of the B-spline basis arguments (when bdeg, nint and pord are lists) or multiple specifications of the bandwidth (when quant is a vector of quantiles).

returnZ

TRUE gives back the alternative design matrix used in the test.

quant

The smoothing bandwidth to be used, expressed as the percentile of the distribution of distance between observations, with default the 25th percentile. To investigate the sensitivity to different choices, quant can be a vector of percentiles. See also robust argument.

metric

A character string specifying the metric to be used. The available options are "euclidean" (the default), "pearson" and "mixed" (to be implemented). "mixed" distance is chosen automatically if some of the selected covariates are not numeric.

kernel

A character string giving the smoothing kernel to be used. This must be one of "uniform", "exponential", "triangular", "neighbours", or "Gauss", with default "uniform".

scale

TRUE to center and scale the covariates before computing the distance.

iorder

Order of the linear interactions, e.g. second order interactions, third order etc., with default iorder=2.

standardize

TRUE standardizes all covariates of the alternative to have unit second central moment. This makes sure that the test result is independent of the relative scaling of the covariates.

x

A numeric vector of values at which to evaluate the B-spline basis.

xs

A matrix or dataframe where the columns correspond to covariates values.

returnU

codeTRUE gives back the nonpenalized part.

Z

Alternative design matrix.

K

Penalty matrix (i.e. the penalty term is the quadratic form of K and the spline coefficients).

tol

Eigenvalues smaller than tol are considered zero.

null.fit

Fitted null model.

object

A gt.object from gtPS, gtKS or gtLI.

Details

These are functions to test for specific types of lack of fit by using the Global Test. Suppose that we are concerned with the adequacy of some regression model response ~ null, such as Y ~ X1 + X2. The alternative model can be cast into the generic form response ~ null + alternative, which comprises different models that accomodate to different types of lack of fit. Thus, the specification of alternative is required. It identifies the type of lack of fit the test is directed against.

By using gtPS, the alternative is given by a user specified sum of smooth functions of continuous covariates, e.g. alternative= ~ s(X1) when covs="X1" and alternative= ~ s(X1) + s(X2) when covs=c("X1","X2"). Smooth terms are constructed using P-splines as proposed by Eilers and Marx (1996). This approach consists in constructing a B-spline basis of degree bdeg with nint + 1 equidistant knots, where a difference penalty of order pord is applied to the basis coefficients. If interact=TRUE, the alternative is given by a multidimensional smooth function of covs, which is represented by a tensor product of marginal B-splines bases and Kronecker sum of the marginal penalties, e.g. alternative= ~ s(X1,X2) when covs=c("X1","X2") and interact=TRUE.

By using gtKS the alternative is given by a user specified multidimensional smooth term, e.g. alternative= ~ s(X1, X2) when covs=c("X1","X2"). Multidimensional smooth terms are represented by a kernel smoother defined by a distance measure (metric), a kernel shape (kernel) and a bandwidth (quant). Because the test is sensitive to the chosen value of quant, it is possible to specify quant as a vector of different values in combination with robust=TRUE. Distance measures for factor covariates and for the situation that both continuous and factor covariates are present are constructed as in le Cessie and van Houwelingen (1995), e.g. covs=c("X1","X2") and distance="mixed" when X1 continuous and X2 factor (to be implemented).

By using gtLI, the alternative is given by all the possible ith-order linear interactions between covs, e.g. alternative= ~ X1:X2 + X1:X3 + X2:X3 when covs=c("X1","X2","X3") and iorder=2.

The remaining functions are meant for constructing the alternative design matrix that will be used in the alternative argument of the gt function. bbase constructs the B-spline basis for the covariate x. This function is based on the functions provided by Eilers and Marx (1996). btensor builts a tensor product of B-splines for the covariates xs, which is reparameterized according with a Kroneker sum of penalties. reparamZ reparameterizes the alternative design matrix (e.g. a spline basis B) according with the order of differences pord or via the spectral decomposition of a roughness matrix K. When several smooth terms are to be combined, reweighZ assigns equal weight to each component term.

See the vignette for more examples.

Value

The function returns an object of class gt.object. Several operations and diagnostic plots can be made from this object.

Methods

sterms

(gt.object): Prints the smooth terms specified by gtPS, gtKS or gtLI.

Note

Currently linear (normal), logistic, multinomial logistic and Poisson regression models with canonical links and Cox's proportional hazards regression model are supported.

Author(s)

Aldo Solari: aldo.solari@unimib.it

References

Eilers, Marx (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11: 89-121.

le Cessie, van Houwelingen (1995). Testing the Fit of a Regression Model Via Score Tests in Random Effects Models. Biometrics 51: 600-614.

For references related to applications of the test, see the vignette GlobalTest.pdf included with this package.

See Also

The gt function. The gt.object and useful functions associated with that object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
 # Random data
    set.seed(0)
    X1<-runif(50)
    s1 <- function(x) exp(2 * x)
    e <- rnorm(50)
    Y <-  s1(X1) + e
    
    ### gtPS
    res<-gtPS(Y~X1)
    res@result
    sterms(res)
    
    # model input
    rdata<-data.frame(Y,X1)
    nullmodel<-lm(Y~X1,data=rdata)
    gtPS(nullmodel)
    
    # formula input and termlabels
    gtPS(Y~exp(2*X1),data=rdata)
    gtPS(Y~exp(2*X1),covs="exp(2 * X1)",data=rdata)
    sterms(gtPS(Y~exp(2*X1),data=rdata,termlabels=TRUE))
    
    # P-splines arguments 
    gtPS(Y~X1, bdeg=3, nint=list(a=10, b=30), pord=0)
    gtPS(Y~X1, bdeg=3, nint=list(a=10, b=30), pord=0, robust=TRUE)

    # Random data: additive model 
    X2<-runif(50)
    s2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10
    Y <-  s1(X1) + s2(X2) + e
    gtPS(Y~X1+X2)
    gtPS(Y~X1+X2, covs="X2")
    sterms(gtPS(Y~X1+X2, nint=list(a=c(10,30), b=20)))
    
    # Random data: smooth surface
    s12 <- function(a, b, sa = 1, sb = 1) {
            (pi^sa * sb) * (1.2 * exp(-(a - 0.2)^2/sa^2 - (b - 0.3)^2/sb^2) + 
            0.8 * exp(-(a - 0.7)^2/sa^2 - (b - 0.8)^2/sb^2))
            }
    Y <- s12(X1,X2) + e
    
    # Tensor product of P-splines
    res<-gtPS(Y~X1*X2, interact=TRUE)
    res@result
    sterms(res)

    ### gtKS  
    res<-gtKS(Y~X1*X2)
    res@result
    sterms(res)
    gtKS(Y~X1*X2, quant=seq(.05,.95,.05), robust=TRUE)
    
    ### gtLI  
    library(MASS)
    data(Boston)
    gtLI(medv~., data=Boston, standardize=TRUE)

globaltest documentation built on Nov. 8, 2020, 8:18 p.m.