gldrm: Fits a generalized linear density ratio model (GLDRM)

View source: R/gldrm.R

gldrmR Documentation

Fits a generalized linear density ratio model (GLDRM)

Description

A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known.

Usage

gldrm(
  formula,
  data = NULL,
  link = "identity",
  mu0 = NULL,
  offset = NULL,
  gldrmControl = gldrm.control(),
  thetaControl = theta.control(),
  betaControl = beta.control(),
  f0Control = f0.control()
)

Arguments

formula

An object of class "formula".

data

An optional data frame containing the variables in the model.

link

Link function. Can be a character string to be passed to the make.link function in the stats package (e.g. "identity", "logit", or "log"). Alternatively, link can be a list containing three functions named linkfun, linkinv, and mu.eta. The first is the link function. The second is the inverse link function. The third is the derivative of the inverse link function. All three functions must be vectorized.

mu0

Mean of the reference distribution. The reference distribution is not unique unless its mean is restricted to a specific value. This value can be any number within the range of observed values, but values near the boundary may cause numerical instability. This is an optional argument with mean(y) being the default value.

offset

Known component of the linear term. Offset must be passed through this argument - offset terms in the formula will be ignored. value and covariate values. If sampling weights are a function of both the response value and covariates, then sampprobs must be a n \times q matrix, where n is the number of observations and q is the number of unique observed values in the response vector. If sampling weights do not depend on the covariate values, then sampprobs may alternatively be passed as a vector of length n. All values must be nonnegative and are assumed to correspond to the sorted response values in increasing order.

gldrmControl

Optional control arguments. Passed as an object of class "gldrmControl", which is constructed by the gldrm.control function. See gldrm.control documentation for details.

thetaControl

Optional control arguments for the theta update procedure. Passed as an object of class "thetaControl", which is constructed by the theta.control function. See theta.control documentation for details.

betaControl

Optional control arguments for the beta update procedure. Passed as an object of class "betaControl", which is constructed by the beta.control function. See beta.control documentation for details.

f0Control

Optional control arguments for the f0 update procedure. Passed as an object of class "f0Control", which is constructed by the f0.control function. See f0.control documentation for details.

Details

The arguments linkfun, linkinv, and mu.eta mirror the "link-glm" class. Objects of this class can be created with the stats::make.link function.

The "gldrm" class is a list of the following items.

  • conv Logical indicator for whether the gldrm algorithm converged within the iteration limit.

  • iter Number of iterations used. A single iteration is a beta update, followed by an f0 update.

  • llik Semiparametric log-likelihood of the fitted model.

  • beta Vector containing the regression coefficient estimates.

  • mu Vector containing the estimated mean response value for each observation in the training data.

  • eta Vector containing the estimated linear combination of covariates for each observation.

  • f0 Vector containing the semiparametric estimate of the reference distribution, evaluated at the observed response values. The values of correspond to the support values, sorted in increasing order.

  • spt Vector containing the unique observed response values, sorted in increasing order.

  • mu0 Mean of the estimated semiparametric reference distribution. The mean of the reference distribution must be fixed at a value in order for the model to be identifiable. It can be fixed at any value within the range of observed response values, but the gldrm function assigns mu0 to be the mean of the observed response values.

  • varbeta Estimated variance matrix of the regression coefficients.

  • seBeta Standard errors for \hat{\beta}. Equal to sqrt(diag(varbeta)).

  • seMu Standard errors for \hat{\mu} computed from varbeta.

  • seEta Standard errors for \hat{\eta} computed from varbeta.

  • theta Vector containing the estimated tilt parameter for each observation. The tilted density function of the response variable is given by

    f(y|x_i) = f_0(y) \exp(\theta_i y) / \int f_0(u) \exp(\theta_i u) du.

  • bPrime is a vector containing the mean of the tilted distribution, b'(\theta_i), for each observation. bPrime should match mu, except in cases where theta is capped for numerical stability.

    b'(\theta_i) = \int u f(u|x_i) du

  • bPrime2 is a vector containing the variance of the tilted distribution, b''(\theta_i), for each observation.

    b''(\theta_i) = \int (u - b'(\theta_i))^2 f(u|x_i) du

  • fTilt is a vector containing the semiparametric fitted probability, \hat{f}(y_i | x_i), for each observation. The semiparametric log-likelihood is equal to

    \sum_{i=1}^n \log \hat{f}(y_i | x_i).

  • sampprobs If sampling probabilities were passed through the sampprobs argument, then they are returned here in matrix form. Each row corresponds to an observation.

  • llikNull Log-likelihood of the null model with no covariates.

  • lr.stat Likelihood ratio test statistic comparing fitted model to the null model. It is calculated as 2 \times (llik - llik_0) / (p-1). The asymptotic distribution is F(p-1, n-p) under the null hypothesis.

  • lr.pval P-value of the likelihood ratio statistic.

  • fTiltMatrix is a matrix containing the semiparametric density for each observation, i.e. \hat{f}(y | x_i) for each unique y value. This is a matrix with nrow equal to the number of observations and ncol equal to the number of unique response values observed. Only returned if returnfTilt = TRUE in the gldrmControl arguments.

  • score.logf0 Score function for log(f0). Only returned if returnf0ScoreInfo = TRUE in the gldrmControl arguments.

  • info.logf0 Information matrix for log(f0). Only returned if returnf0ScoreInfo = TRUE in the gldrmControl arguments.

  • formula Model formula.

  • data Model data frame.

  • link Link function. If a character string was passed to the link argument, then this will be an object of class "link-glm". Otherwise, it will be the list of three functions passed to the link argument.

Value

An S3 object of class "gldrm". See details.

Examples

data(iris, package="datasets")

# Fit a gldrm with log link
fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
             data=iris, link="log")
fit

# Fit a gldrm with custom link function
link <- list()
link$linkfun <- function(mu) log(mu)^3
link$linkinv <- function(eta) exp(eta^(1/3))
link$mu.eta <- function(eta) exp(eta^(1/3)) * 1/3 * eta^(-2/3)
fit2 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
              data=iris, link=link)
fit2


gldrm documentation built on May 29, 2024, 4:28 a.m.

Related to gldrm in gldrm...