gpls.formula: Projection to Latent Structues for Generalized Linear Models
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description Usage Arguments Details Value Examples

Projection to Latent Structues for Generalized Linear Models

## S3 method for class 'formula'
gpls(
  formula,
  data,
  ncomp = NULL,
  eps = 0.001,
  maxit = 100,
  denom.eps = 1e-20,
  family = NULL,
  link = NULL,
  firth = FALSE,
  contrasts = NULL,
  ...
)

`formula`	model formula
`data`	a data frame
`ncomp`	number of components to retain
`eps`	tolerance
`maxit`	max iter
`denom.eps`	tolerance value for denominator to consider a number as zero
`family`	"gaussian", "poisson", "negative.binomial", "binomial", "multinom", "Gamma", "inverse.gaussian"
`link`	the link function. see details for available options.
`firth`	should Firth's bias correction be applied? defaults to FALSE.
`contrasts`	model contrasts
`...`	other

This function implements what is often called partial least squares for generalized linear models. However, Swedish statisticians Herman Wold and Svante Wold, who invented the method, maintain that the proper name is projection to latent structures. This name is used here because it would be improper to call generalized linear models by the name least squares. As the name implies, PLS works by projecting the predictor matrix to a lower dimensional subspace comprised of latent factors (in the sense of factor analysis, not categorical variables). Essentially, this can save an analytic step. Instead of asking "what factors underlie my variables", and then using the factor scores as predictors, PLS directly answers the question of "what latent factors explain my outcome variable." The returned regression coefficients correspond to the original set of explanatory variables, facilitating inference about the original variables if being used for reasons other than a factor analytic method.

PLS regression is useful for a variety of circumstances. These include the following:

multicollinear variables.
variables believed to be measures of an underlying latent factor (which typically entails multicollinearity)
regression problems where there are more predictor variables than observations (deficient rank)
minimizing prediction error in a manner similar to ridge regression
recovering a set of factors that explain an outcome, a sort of "supervised factor analysis".

Several likelihood functions are implemented here. These include the gaussian, poisson, binomial, gamma, inverse gaussian, and negative binomial distributions. The gaussian distribution is naturally ideal for continuous data. The binomial distribution is utilized for binary outcomes, while the multinomial distribution can model multiple outcomes. The poisson and negative binomial distributions are appropriate for integer count data, with the negative binomial being well suited for overdispersed counts. The gamma distribution and inverse gaussian distributions are appropriate for continuous data with positive support, with the gamma assuming a constant coefficient of variation and the inverse gaussian being suitable for heteroskedastic and/or highly skewed data.

The following link functions are available for each distribution:

Gaussian: "identity"
Binomial & Multinomial: "logit", "probit", "cauchit", "robit" (Student T with 3 df), and "cloglog"
Poisson & Negative Binomial: "log"
Gamma: "inverse" (1 / x)
Inverse Gaussian: "1/mu^2" (1/x^2)

a gpls object

a gpls object containing the model fit, factor loadings, linear predictors, fitted values, and an assortment of other things.