BranchGLM: Fits GLMs

View source: R/BranchGLM.R

BranchGLMR Documentation

Fits GLMs

Description

Fits generalized linear models via RcppArmadillo. Also has the ability to fit the models with parallelization via OpenMP.

Usage

BranchGLM(
  formula,
  data,
  family,
  link,
  offset = NULL,
  method = "Fisher",
  grads = 10,
  parallel = FALSE,
  nthreads = 8,
  tol = 1e-06,
  maxit = NULL,
  init = NULL,
  fit = TRUE,
  contrasts = NULL,
  keepData = TRUE,
  keepY = TRUE
)

BranchGLM.fit(
  x,
  y,
  family,
  link,
  offset = NULL,
  method = "Fisher",
  grads = 10,
  parallel = FALSE,
  nthreads = 8,
  init = NULL,
  maxit = NULL,
  tol = 1e-06
)

Arguments

formula

a formula for the model.

data

a dataframe that contains the response and predictor variables.

family

distribution used to model the data, one of "gaussian", "gamma", "binomial", or "poisson".

link

link used to link mean structure to linear predictors. One of "identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log".

offset

offset vector, by default the zero vector is used.

method

one of "Fisher", "BFGS", or "LBFGS". BFGS and L-BFGS are quasi-newton methods which are typically faster than Fisher's scoring when there are many covariates (at least 50).

grads

number of gradients used to approximate inverse information with, only for method = "LBFGS".

parallel

whether or not to make use of parallelization via OpenMP.

nthreads

number of threads used with OpenMP, only used if parallel = TRUE.

tol

tolerance used to determine model convergence.

maxit

maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200.

init

initial values for the betas, if not specified then they are automatically selected.

fit

a logical value to indicate whether to fit the model or not. Setting this to false will make it so no coefficients matrix or variance-covariance matrix are returned.

contrasts

see contrasts.arg of model.matrix.default.

keepData

Whether or not to store a copy of data and design matrix, the default is TRUE. If this is FALSE, then the results from this cannot be used inside of VariableSelection.

keepY

Whether or not to store a copy of y, the default is TRUE. If this is FALSE, then the binomial GLM helper functions may not work and this cannot be used inside of VariableSelection.

x

design matrix used for the fit, must be numeric.

y

outcome vector, must be numeric.

Details

Can use BFGS, L-BFGS, or Fisher's scoring to fit the GLM. BFGS and L-BFGS are typically faster than Fisher's scoring when there are at least 50 covariates and Fisher's scoring is typically best when there are fewer than 50 covariates. This function does not currently support the use of weights. In the special case of gaussian regression with identity link the method argument is ignored and the normal equations are solved directly.

The models are fit in C++ by using Rcpp and RcppArmadillo. In order to help convergence, each of the methods makes use of a backtracking line-search using the strong Wolfe conditions to find an adequate step size. There are also two conditions used to control convergence, the first is whether there is a sufficient decrease in the negative log-likelihood, and the other is whether the norm of the score is sufficiently small. The tol argument controls both of these criteria. If the algorithm fails to converge, then iterations will be -1.

All observations with any missing values are removed before model fitting.

The dispersion parameter for gamma regression is estimated via maximum likelihood, very similar to the gamma.dispersion function from the MASS package.

BranchGLM.fit can be faster than calling BranchGLM if the x matrix and y vector are already available, but doesn't return as much information. The object returned by BranchGLM.fit is not of class BranchGLM, so all of the methods for BranchGLM objects such as predict or VariableSelection cannot be used.

Value

BranchGLM returns a BranchGLM object which is a list with the following components

coefficients

a matrix with the coefficients estimates, SEs, wald test statistics, and p-values

iterations

number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1

dispersion

the value of the dispersion parameter

logLik

the log-likelihood of the fitted model

vcov

the variance-covariance matrix of the fitted model

resDev

the residual deviance of the fitted model

AIC

the AIC of the fitted model

preds

predictions from the fitted model

linpreds

linear predictors from the fitted model

tol

tolerance used to fit the model

maxit

maximum number of iterations used to fit the model

formula

formula used to fit the model

method

iterative method used to fit the model

grads

number of gradients used to approximate inverse information for L-BFGS

y

y vector used in the model, not included if keepY = FALSE

x

design matrix used to fit the model, not included if keepData = FALSE

offset

offset vector in the model, not included if keepData = FALSE

data

original dataframe supplied to the function, not included if keepData = FALSE

mf

the model frame, not included if keepData = FALSE

numobs

number of observations in the design matrix

names

names of the variables

yname

name of y variable

parallel

whether parallelization was employed to speed up model fitting process

missing

number of missing values removed from the original dataset

link

link function used to model the data

family

family used to model the data

ylevel

the levels of y, only included for binomial glms

xlev

the levels of the factors in the dataset

terms

the terms object used

BranchGLM.fit returns a list with the following components

coefficients

a matrix with the coefficients estimates, SEs, wald test statistics, and p-values

iterations

number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1

dispersion

the value of the dispersion parameter

logLik

the log-likelihood of the fitted model

vcov

the variance-covariance matrix of the fitted model

resDev

the residual deviance of the fitted model

AIC

the AIC of the fitted model

preds

predictions from the fitted model

linpreds

linear predictors from the fitted model

tol

tolerance used to fit the model

maxit

maximum number of iterations used to fit the model

Examples

Data <- iris
### Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

### Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gaussian", link = "identity")

BranchGLM documentation built on Aug. 31, 2023, 5:17 p.m.