goodnessoffit: Goodness of fit testing in regression models using Global...
In globaltest: Testing Groups of Covariates/Features for Association with a Response Variable, with Applications to Gene Set Testing

Description Usage Arguments Details Value Methods Note Author(s) References See Also Examples

Tests the goodness of fit of a regression model against a specified alternative using the Global Test. Three main functions are provided: gtPS uses Penalized Splines, gtKS uses Kernel Smoothers and gtLI uses Linear Interactions. The other functions are for external use in combination with gt.

gtPS(response, null, data, 
      model = c("linear", "logistic", "cox", "poisson", "multinomial"),
      ..., covs, bdeg = 3, nint= 10, pord = 2, interact = FALSE, robust = FALSE, 
      termlabels = FALSE, returnZ = FALSE)
  
gtKS(response, null, data, 
      model = c("linear", "logistic", "cox", "poisson", "multinomial"),
      ..., covs, quant = .25, metric = c("euclidean", "pearson"), 
      kernel=c("uniform", "exponential", "triangular", "neighbours", "gauss"),
      robust = FALSE, scale = TRUE, termlabels = FALSE, returnZ = FALSE)
	
gtLI(response, null, data, ..., covs, iorder=2, termlabels = FALSE, standardize = FALSE)

bbase(x, bdeg, nint)

btensor(xs, bdeg, nint, pord, returnU=FALSE)

reparamZ(Z, pord, K=NULL, tol = 1e-10, returnU=FALSE)

reweighZ(Z, null.fit)

sterms(object, ...)

`response`	The response vector of the regression model. May be supplied as a vector, as a `formula` object, or as an object of class `lm`, `glm` or `coxph`. In the last two cases, the specification of `null` is not required.
`null`	The null design matrix. May be given as a matrix or as a half `formula` object (e.g. `~a+b`).
`data`	Only used when `response` or `null` is given in formula form. An optional data frame, list or environment containing the variables used in the formulae.
`model`	The type of regression model to be tested. If omitted, the function will try to determine the model from the class and values of the `response` argument.
`...`	Any other arguments are also passed on to `gt`.
`covs`	A variable or a vector of variables that are the covariates the smooth terms are function of.
`bdeg`	A vector or a list of vectors which specifies the degree of the B-spline basis, with default `bdeg=3`.
`nint`	A vector or a list of vectors which specifies the number of intervals determined by equally-spaced knots, with default `nint=10`.
`pord`	A vector or a list of vectors which specifies the order of the differences indicating the type of the penalty imposed to the coefficients, with default `pord=2`.
`interact`	`TRUE` to consider a multidimensional smooth function of `covs`.
`termlabels`	`TRUE` to consider e.g. `s(log(cov))` instead of `s(cov)` when `null=~ log(cov)` and `covs` is missing.
`robust`	`TRUE` to obtain an overall test which combines multiple specifications of the B-spline basis arguments (when `bdeg`, `nint` and `pord` are lists) or multiple specifications of the bandwidth (when `quant` is a vector of quantiles).
`returnZ`	`TRUE` gives back the alternative design matrix used in the test.
`quant`	The smoothing bandwidth to be used, expressed as the percentile of the distribution of distance between observations, with default the 25th percentile. To investigate the sensitivity to different choices, `quant` can be a vector of percentiles. See also `robust` argument.
`metric`	A character string specifying the metric to be used. The available options are "euclidean" (the default), "pearson" and "mixed" (to be implemented). "mixed" distance is chosen automatically if some of the selected covariates are not numeric.
`kernel`	A character string giving the smoothing kernel to be used. This must be one of "uniform", "exponential", "triangular", "neighbours", or "Gauss", with default "uniform".
`scale`	`TRUE` to center and scale the covariates before computing the distance.
`iorder`	Order of the linear interactions, e.g. second order interactions, third order etc., with default `iorder=2`.
`standardize`	TRUE standardizes all covariates of the alternative to have unit second central moment. This makes sure that the test result is independent of the relative scaling of the covariates.
`x`	A numeric vector of values at which to evaluate the B-spline basis.
`xs`	A matrix or dataframe where the columns correspond to covariates values.
`returnU`	codeTRUE gives back the nonpenalized part.
`Z`	Alternative design matrix.
`K`	Penalty matrix (i.e. the penalty term is the quadratic form of K and the spline coefficients).
`tol`	Eigenvalues smaller than `tol` are considered zero.
`null.fit`	Fitted null model.
`object`	A `gt.object` from `gtPS`, `gtKS` or `gtLI`.

These are functions to test for specific types of lack of fit by using the Global Test. Suppose that we are concerned with the adequacy of some regression model response ~ null, such as Y ~ X1 + X2. The alternative model can be cast into the generic form response ~ null + alternative, which comprises different models that accomodate to different types of lack of fit. Thus, the specification of alternative is required. It identifies the type of lack of fit the test is directed against.

By using gtPS, the alternative is given by a user specified sum of smooth functions of continuous covariates, e.g. alternative= ~ s(X1) when covs="X1" and alternative= ~ s(X1) + s(X2) when covs=c("X1","X2"). Smooth terms are constructed using P-splines as proposed by Eilers and Marx (1996). This approach consists in constructing a B-spline basis of degree bdeg with nint + 1 equidistant knots, where a difference penalty of order pord is applied to the basis coefficients. If interact=TRUE, the alternative is given by a multidimensional smooth function of covs, which is represented by a tensor product of marginal B-splines bases and Kronecker sum of the marginal penalties, e.g. alternative= ~ s(X1,X2) when covs=c("X1","X2") and interact=TRUE.

By using gtKS the alternative is given by a user specified multidimensional smooth term, e.g. alternative= ~ s(X1, X2) when covs=c("X1","X2"). Multidimensional smooth terms are represented by a kernel smoother defined by a distance measure (metric), a kernel shape (kernel) and a bandwidth (quant). Because the test is sensitive to the chosen value of quant, it is possible to specify quant as a vector of different values in combination with robust=TRUE. Distance measures for factor covariates and for the situation that both continuous and factor covariates are present are constructed as in le Cessie and van Houwelingen (1995), e.g. covs=c("X1","X2") and distance="mixed" when X1 continuous and X2 factor (to be implemented).

By using gtLI, the alternative is given by all the possible ith-order linear interactions between covs, e.g. alternative= ~ X1:X2 + X1:X3 + X2:X3 when covs=c("X1","X2","X3") and iorder=2.

The remaining functions are meant for constructing the alternative design matrix that will be used in the alternative argument of the gt function. bbase constructs the B-spline basis for the covariate x. This function is based on the functions provided by Eilers and Marx (1996). btensor builts a tensor product of B-splines for the covariates xs, which is reparameterized according with a Kroneker sum of penalties. reparamZ reparameterizes the alternative design matrix (e.g. a spline basis B) according with the order of differences pord or via the spectral decomposition of a roughness matrix K. When several smooth terms are to be combined, reweighZ assigns equal weight to each component term.

See the vignette for more examples.

The function returns an object of class gt.object. Several operations and diagnostic plots can be made from this object.

sterms: (gt.object): Prints the smooth terms specified by gtPS, gtKS or gtLI.

Currently linear (normal), logistic, multinomial logistic and Poisson regression models with canonical links and Cox's proportional hazards regression model are supported.

Aldo Solari: aldo.solari@unimib.it

Eilers, Marx (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11: 89-121.

le Cessie, van Houwelingen (1995). Testing the Fit of a Regression Model Via Score Tests in Random Effects Models. Biometrics 51: 600-614.

For references related to applications of the test, see the vignette GlobalTest.pdf included with this package.

The gt function. The gt.object and useful functions associated with that object.

 # Random data
    set.seed(0)
    X1<-runif(50)
    s1 <- function(x) exp(2 * x)
    e <- rnorm(50)
    Y <-  s1(X1) + e
    
    ### gtPS
    res<-gtPS(Y~X1)
    res@result
    sterms(res)
    
    # model input
    rdata<-data.frame(Y,X1)
    nullmodel<-lm(Y~X1,data=rdata)
    gtPS(nullmodel)
    
    # formula input and termlabels
    gtPS(Y~exp(2*X1),data=rdata)
    gtPS(Y~exp(2*X1),covs="exp(2 * X1)",data=rdata)
    sterms(gtPS(Y~exp(2*X1),data=rdata,termlabels=TRUE))
    
    # P-splines arguments 
    gtPS(Y~X1, bdeg=3, nint=list(a=10, b=30), pord=0)
    gtPS(Y~X1, bdeg=3, nint=list(a=10, b=30), pord=0, robust=TRUE)

    # Random data: additive model 
    X2<-runif(50)
    s2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * (10 * x)^3 * (1 - x)^10
    Y <-  s1(X1) + s2(X2) + e
    gtPS(Y~X1+X2)
    gtPS(Y~X1+X2, covs="X2")
    sterms(gtPS(Y~X1+X2, nint=list(a=c(10,30), b=20)))
    
    # Random data: smooth surface
    s12 <- function(a, b, sa = 1, sb = 1) {
            (pi^sa * sb) * (1.2 * exp(-(a - 0.2)^2/sa^2 - (b - 0.3)^2/sb^2) + 
            0.8 * exp(-(a - 0.7)^2/sa^2 - (b - 0.8)^2/sb^2))
            }
    Y <- s12(X1,X2) + e
    
    # Tensor product of P-splines
    res<-gtPS(Y~X1*X2, interact=TRUE)
    res@result
    sterms(res)

    ### gtKS  
    res<-gtKS(Y~X1*X2)
    res@result
    sterms(res)
    gtKS(Y~X1*X2, quant=seq(.05,.95,.05), robust=TRUE)
    
    ### gtLI  
    library(MASS)
    data(Boston)
    gtLI(medv~., data=Boston, standardize=TRUE)