View source: R/family.categorical.R

cumulative | R Documentation |

Fits a cumulative link regression model to a (preferably ordered) factor response.

```
cumulative(link = "logitlink", parallel = FALSE, reverse = FALSE,
multiple.responses = FALSE, thresholds = c("unconstrained",
"equidistant", "symmetric1", "symmetric0"), Treverse = reverse,
Tref = if (Treverse) "M" else 1, whitespace = FALSE)
```

`link` |
Link function applied to the |

`parallel` |
A logical or formula specifying which terms have
equal/unequal coefficients.
See below for more information about the parallelism
assumption.
The default results in what some people call the
The |

`reverse` |
Logical.
By default, the cumulative probabilities used are
This should be set to |

`multiple.responses` |
Logical.
Multiple responses? If |

`thresholds` |
Character.
The choices concern the fitted intercepts.
They can be constrained to be equally-spaced, etc.
See If equally-spaced then the direction and the reference
level are controlled
by If If |

`Treverse, Tref` |
Support arguments for |

`whitespace` |
See |

In this help file the response `Y`

is assumed
to be a factor with ordered values `1,2,\dots,J+1`

.
Hence `M`

is the number of linear/additive
predictors `\eta_j`

;
for `cumulative()`

one has `M=J`

.

This VGAM family function fits the class of
*cumulative link models* to (hopefully)
an ordinal response.
By default, the *non-parallel* cumulative logit model
is fitted, i.e.,

`\eta_j = logit(P[Y \leq j])`

where `j=1,2,\dots,M`

and
the `\eta_j`

are not constrained to be parallel.
This is also known as the *non-proportional odds model*.
If the logit link is replaced by a complementary log-log link
(`clogloglink`

) then
this is known as the *proportional-hazards model*.

In almost all the literature, the constraint matrices
associated with this family of models are known.
For example, setting
`parallel = TRUE`

will make all constraint matrices
(except for the intercept) equal to a vector of `M`

1's.
If the constraint matrices are equal, unknown and to
be estimated,
then this can be achieved by fitting the model as a
reduced-rank vector generalized
linear model (RR-VGLM; see `rrvglm`

).
Currently, reduced-rank vector generalized additive models
(RR-VGAMs) have not been implemented here.

An object of class `"vglmff"`

(see `vglmff-class`

).
The object is used by modelling functions
such as `vglm`

,
and `vgam`

.

No check is made to verify that the response is ordinal
if the response is a matrix;
see `ordered`

.

Boersch-Supan (2021) looks at sparse data and
the numerical problems that result;
see `sratio`

.

The response should be either a matrix of counts
(with row sums that
are all positive), or a factor. In both cases,
the `y`

slot
returned by
`vglm`

/`vgam`

/`rrvglm`

is the matrix
of counts.
The formula must contain an intercept term.
Other VGAM family functions for an ordinal response
include
`acat`

,
`cratio`

,
`sratio`

.
For a nominal (unordered) factor response, the multinomial
logit model (`multinomial`

) is more appropriate.

With the logit link, setting `parallel = TRUE`

will fit a
proportional odds model. Note that the `TRUE`

here
does not apply to the intercept term.
In practice, the validity of
the *proportional odds assumption* needs to be checked,
e.g., by a likelihood ratio test (LRT).
If acceptable on the data, then numerical
problems are less
likely to occur during the fitting, and there are less
parameters. Numerical problems occur when the linear/additive
predictors cross, which results in probabilities outside of
`(0,1)`

; setting `parallel = TRUE`

will help
avoid this problem.

Here is an example of the usage of the `parallel`

argument.
If there are covariates `x2`

, `x3`

and
`x4`

, then
`parallel = TRUE ~ x2 + x3 -1`

and
`parallel = FALSE ~ x4`

are equivalent.
This would constrain the regression coefficients
for `x2`

and `x3`

to be equal;
those of the intercepts and `x4`

would be different.

If the data is inputted in *long* format
(not *wide* format, as in `pneumo`

below)
and the self-starting initial values are not
good enough then try using
`mustart`

,
`coefstart`

and/or
`etatstart`

.
See the example below.

To fit the proportional odds model one can use the
VGAM family function `propodds`

.
Note that `propodds(reverse)`

is equivalent to
`cumulative(parallel = TRUE, reverse = reverse)`

(which is equivalent to
`cumulative(parallel =`

`TRUE, reverse = reverse, link = "logitlink")`

).
It is for convenience only. A call to
`cumulative()`

is preferred since it reminds the user
that a parallelism assumption is made, as well as
being a lot more flexible.

Category specific effects may be modelled using
the `xij`

-facility; see
`vglm.control`

and `fill1`

.

With non-default `thresholds`

choices,
the first few fitted regression coefficients need care when
interpreting them.
For example, some values could be the distance away from
the median intercept.
Typing something like `constraints(fit)[[1]]`

gives
the constraint matrix of the intercept term.

Thomas W. Yee

Agresti, A. (2013).
*Categorical Data Analysis*,
3rd ed. Hoboken, NJ, USA: Wiley.

Agresti, A. (2010).
*Analysis of Ordinal Categorical Data*,
2nd ed. Hoboken, NJ, USA: Wiley.

McCullagh, P. and Nelder, J. A. (1989).
*Generalized Linear Models*, 2nd ed.
London: Chapman & Hall.

Tutz, G. (2012).
*Regression for Categorical Data*,
Cambridge: Cambridge University Press.

Tutz, G. and Berger, M. (2022).
Sparser ordinal regression models based
on parametric and additive location-shift
approaches.
*International Statistical Review*,
**90**, 306–327.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/insr.12484")}.

Yee, T. W. (2010).
The VGAM package for categorical data analysis.
*Journal of Statistical Software*,
**32**, 1–34.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v032.i10")}.

Yee, T. W. and Wild, C. J. (1996).
Vector generalized additive models.
*Journal of the Royal Statistical Society,
Series B, Methodological*,
**58**, 481–493.

`propodds`

,
`constraints`

,
`R2latvar`

,
`ordsup`

,
`prplot`

,
`margeff`

,
`acat`

,
`cratio`

,
`sratio`

,
`multinomial`

,
`CommonVGAMffArguments`

,
`pneumo`

,
`budworm`

,
`Links`

,
`hdeff.vglm`

,
`logitlink`

,
`probitlink`

,
`clogloglink`

,
`cauchitlink`

,
`gordlink`

,
`pordlink`

,
`nbordlink`

,
`logistic1`

.

```
# Proportional odds model (p.179) of McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = TRUE, reverse = TRUE), pneumo))
depvar(fit) # Sample proportions (good technique)
fit@y # Sample proportions (bad technique)
weights(fit, type = "prior") # Number of observations
coef(fit, matrix = TRUE)
constraints(fit) # Constraint matrices
apply(fitted(fit), 1, which.max) # Classification
apply(predict(fit, newdata = pneumo, type = "response"),
1, which.max) # Classification
R2latvar(fit)
# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
cumulative(reverse = TRUE), data = pneumo)
## Not run:
plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2)
## End(Not run)
# Check the proportional odds assumption with a LRT ----------
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = FALSE, reverse = TRUE), pneumo))
pchisq(2 * (logLik(fit3) - logLik(fit)), df =
length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
lrtest(fit3, fit) # More elegant
# A factor() version of fit ----------------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs) # apply(Nobs, 2, sum)
pneumo.long <-
data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
times = c(t(Nobs))),
levels = colnames(Nobs)),
let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms)) # Should be same as pneumo
(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
cumulative(parallel = TRUE, reverse = TRUE)))
coef(fit.long1, matrix = TRUE) # cf. coef(fit, matrix = TRUE)
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
cumulative(parallel = TRUE, reverse = TRUE),
data = pneumo.long, trace = TRUE)
coef(fit.long2, matrix = TRUE) # cf. coef(fit, matrix = TRUE)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.