marge: marge

View source: R/marge.R

margeR Documentation

marge

Description

MARS fitting function for generalized linear models (GLM) and generalized estimating equations (GEE).

Usage

marge(
  formula,
  data,
  N,
  n = 1,
  id = c(1:nrow(data)),
  family = "gaussian",
  corstr = "independence",
  pen = 2,
  tols_score = 1e-05,
  M = 21,
  minspan = NULL,
  print.disp = FALSE,
  nb = FALSE,
  is.gee = FALSE,
  ...
)

Arguments

formula

: an object of class "formula" (or one that can be coerced to that class): a symbolic description of first order terms to be included in the model to be fitted. See GLM function for further formula details.

data

: a data frame containing all the terms found in formula. Should have n by N rows.

N

: the number of clusters.

n

: the maximum cluster size.

id

: a vector which identifies the clusters. The length of id should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula.

family

: the specified family for the GLM/GEE. The default is family = "gaussian". The current available families are: "gaussian", "binomial", "poisson" and the "negative binomial" (to get that use family = "poisson" and nb = TRUE).

corstr

: the specified "working correlation" structure for the GEE. The default is corstr = "independence".

pen

: a set penalty used for the GCV (note: MARGE doesn't actually use this). The default is 2.

tols_score

: the set tolerance for monitoring the convergence for the difference in score statistics between the parent and candidate model (this is the lack-of-fit criterion used for MARGE). The default is 0.00001

M

: a set threshold for the number of basis functions to be used. The default is 21.

minspan

: a set minimum span value. The default is minspan = NULL.

print.disp

: a logical argument, do you want to print the output? The default is FALSE.

nb

: a logical argument, is the model a negative binomial model? The default is FALSE.

is.gee

: is this a GEE model? The default is FALSE.

...

: further arguments passed to or from other methods.

Details

For further details please look at the mars_ls function - there are more details on the general MARS algorithm. MARGE will produce output for two penalties: 2 and log(N). A figure is automatically generated plotting WIC against the no. of parameters.

Value

marge returns a list of calculated values consisting of:

B_final : the basis model matrix for the final model fit.

wic_mat : a matrix of WIC values (with both penalties) for MARGE models given by the forward pass.

min_wic : the WIC (with both penalties) for the final MARGE model.

GCV : the GCV for the final selected model.

y_pred : the fitted values from the final selected model (with both penalties).

final_mod : the final selected (with both penalties) model matrix.

Author(s)

Jakub Stoklosa and David I. Warton.

References

Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.

Stoklosa, J., Gibb, H. and Warton, D.I. (2014). Fast forward selection for generalized estimating equations with a large number of predictor variables. Biometrics, 70, 110–120.

Stoklosa, J. and Warton, D.I. (2018). A generalized estimating equation approach to multivariate adaptive regression splines. Journal of Computational and Graphical Statistics, 27, 245–253.

See Also

mars_ls and backward_sel_WIC

Examples

# Example 1: Tree growth data from Wood (2006) monograph

library(mgcv)

N <- 14                            # Sample size (number of clusters).
n <- 6                             # Cluster size.
id <- factor(rep(1:N, each = n))   # The ID of each cluster.

Y0 <- matrix(log(Loblolly$height), ncol = n, byrow = TRUE) # Response variable.
Y <- c(t(Y0))
X_pred0 <- matrix(log(Loblolly$age), ncol = n, byrow = TRUE) # Predictor variable.
X_pred <- scale(c(t(X_pred0)))  # Standarize the predictor.
colnames(X_pred) <- c("age")

dat1 <- as.data.frame(cbind(X_pred, c(t(Y)), id))
colnames(dat1) <- c("age", "Y", "id")
dat1$age1 <- c(t(X_pred0))
dat1$id <- factor(dat1$id)

family <- "gaussian"   # The selected "exponential" family for the GLM/GEE.

is.gee <- TRUE         # Is the model a GEE?
nb <- FALSE            # Is this a negative binomial model?

tols_score <- 0.00001  # Tolerance for stopping condition in forward pass for GEE.
M <- 21                # Max. no. of terms.
pen <- 2               # Penalty to be used in GCV.
minspan <- NULL        # A set minimum span value.
print.disp <- FALSE    # Print ALL the output?

# Start model fitting functions here.

corstr <- "independence"  # Independent working correlation structure.
model_marge_ind <- marge(X_pred, Y, N, n, id, family, corstr, pen, tols_score,
M, minspan, print.disp, nb, is.gee)

corstr <- "ar1"           # AR1 working correlation structure.
model_marge_ar1 <- marge(X_pred, Y, N, n, id, family, corstr, pen, tols_score,
M, minspan, print.disp, nb, is.gee)

corstr <- "exchangeable"  # Exchangeable working correlation structure.
model_marge_exch <- marge(X_pred, Y, N, n, id, family, corstr, pen, tols_score,
M, minspan, print.disp, nb, is.gee)

# Example 2: Presence-absence data

# Load the "leptrine" presence-absence data.

data(leptrine)

dat1 <- leptrine[[1]]  # Training data.
Y <- dat1$Y            # Response variable.
N <- length(Y)         # Sample size (number of clusters).
n <- 1                 # Cluster size.
id <- rep(1:N, each = n)  # The ID of each cluster.

X_pred <- dat1[, -c(3:10)] # Design matrix using only two (of nine) predictors.

# Set MARGE tuning parameters.

family <- "binomial"   # The selected "exponential" family for the GLM/GEE.
is.gee <- FALSE        # Is the model a GEE?
nb <- FALSE            # Is this a negative binomial model?

tols_score <- 0.0001   # A set tolerance (stopping condition) in forward pass for MARGE.
M <- 21                # A set threshold for the maximum number of basis functions to be used.
print.disp <- FALSE    # Print ALL the output?
pen <- 2               # Penalty to be used in GCV.
minspan <- NULL        # A set minimum span value.

# Fit the MARGE models (about ~ 30 secs.)

mod <- marge(Y ~ RAIN_DRY_QTR + FC, data = dat1, N, n, id, family, corstr, pen, tols_score,
             M, minspan, print.disp, nb, is.gee)

JakubStats/marge documentation built on Feb. 25, 2024, 9:38 p.m.