pblm: Bivariate Additive Regression for Categorical Responses
In pblm: Bivariate Additive Marginal Regression for Categorical Responses

View source: R/pblm_0.1-12.R

pblm	R Documentation

Bivariate Additive Regression for Categorical Responses

Description

This function allows to fit bivariate additive marginal logistic regression models for nominal, ordinal or mixed nominal/ordinal responses.

Usage

pblm(fo1=NULL, fo2=NULL, fo12=NULL, RC.fo=NULL, data, weights=NULL, 
     type="gg", verbose=FALSE, contrasts=NULL, start=NULL, x=FALSE, 
     center=FALSE, scale=FALSE, plackett=NULL, ncat1=NULL, ncat2=NULL, 
     fit=TRUE, proportional=pblm.prop(...), penalty=pblm.penalty(...), 
     control=pblm.control(...), ...)

Arguments

`fo1`	a two hand-side formula for the logit(s) of the first response. Depending on the data structure, the left-hand side can be a two-column or a multi-column response vector. In the latter case, argument `ncat1` must be specified. See Examples.
`fo2`	a one hand-side formula for the logit(s) of the second response. If omitted, it will be assumed to be equal to `fo1`.
`fo12`	a one hand-side formula for the log-odds ratio(s) of the association between the two responses. If omitted, it will be assumed to be equal to `fo1`.
`RC.fo`	a Row/Column type formula specifying the association structure. See Details.
`data`	a data frame.
`weights`	An optional vector containing the observed frequencies.
`type`	a two-length string specifying the type of logits to use in the model fit. See Details.
`verbose`	logical. Should information about convergence be printed during model estimation?
`contrasts`	the Row/Column contrasts to be used in `RC.fo`. See argument `contrasts.arg` in `model.matrix.default`.
`x`	logical. Should the model matrix used in the fitting process be returned as component of the fitted model?
`center`	logical. Should the covariates be centered with respect their mean before the fit?
`scale`	logical. Should the covariates be scaled with respect their standard deviation before the fit?
`start`	an optional vector of starting values for the coefficients of the non-additive part.
`plackett`	logical. Should a Plackett-based formula be used for the inversion `\eta`->`\pi`. Actually, this is allowed (and it is the default) for binary or ordered responses only. The more general method uses the Newton-Raphson inversion algorithm described in Glonek and McCullagh (1995).
`ncat1`	an integer indicating the number of levels of the first response. Mandatory the data is in multicolumn format. See example below.
`ncat2`	an integer indicating the number of levels of the second response.
`fit`	logical. If TRUE (default), the model will be estimated, otherwise only some objects created prior to estimation will be returned.
`control`	this sets the control parameters for the Fisher-scoring and the inner Newton-Raphson and backfitting algorithms. The default setting is specified by the `pblm.control` function.
`proportional`	this sets a list of logical vectors specifying which explanatory variables depend on the categories of the responses. The default setting is specified by the `pblm.prop` function
`penalty`	this sets the penalty terms and smoothing parameters for a non-parametric "vertical" smoothing across the categories of the responses. The default setting is specified by the `pblm.penalty` function.
`...`	further arguments.

Details

It is possible to fit partial proportional odds models and specify several association structures like the Goodman's (1979) model (interactions are allowed though these are of linear type), the Dale's (1986) model and their additive version (Bustami et al., 2001), Enea et al.(2014). Furthermore, the association structure can also be smoothed by using penalty terms of polynomial type as those considered in Enea and Attanasio (2015). That allows to enlarge the range of possible parametrizations of the association structure as an alternative to the Dale's Row-Column parametrization. Further details on the penalty terms specified through pblm.penalty can also found in Enea and Lovison (2014).

The algorithm is based on the bivariate version of the model by Glonek and McCullagh (1995), that is by using the multivariate logit transform

\bm{C}'\log(\bm{M}\bm{\pi})= \bm{\eta}=\bm{X}\bm{\beta}.

Once fo1 has been specified, if fo2 and fo12 are left unspecified, these are assumed to be equal to fo1. By default, the function fits a proportional odds model for ordered responses, in which only the marginal and the association intercepts are assumed to be category-dependent (Glonek and McCullagh, 1995).

Model formulae using a Row-Column type parametrization, like in the Goodman or the Dale models, need to be specified by using RC.fo. The right-hand side of such formula only recognizes Row and Col to specify row and/or column effects. Covariates must be specified separately by using fo12. See Examples.
The logits implemented are local; global; continuation; reverse continuation; (Colombi and Forcina, 2001) and basic. By using argument type, several log-odds ratios can be specified, among the permutations of the local-local type (type="ll"), local-global ("lg"),..., basic-basic (type="bb"). Furthermore, if the responses are binary, setting type="ss" will correspond to classical logit \pi = \log P(Y=1)/P(Y=0) for both responses, while other specifications will produce \log P(Y=0)/P(Y=1).

The vector of the starting values must be set in the following order: itercepts of the first logit; covariates of the first logit; intercepts of the second logit; covariates of the second logit; association intercepts; association covariates.

For what concerning the additive part, p-slines can be fitted by using pb and/or pbs, which are adaptations (reduced versions) of pb and pbs from the gamlss package, respectively.

Value

A list of components from the model fit. Some of these could be redundant and removed in a future version of the package. When fit=TRUE the returned components are:

`coef`	a named column vectors of coefficients.
`n`	the total number of observations.
`m`	the number of observed configurations of the responses given covariates (if any). It corresponds to the number of rows of the dataset.
`p`	fitted probability matrix given the observations.
`Y`	the weighted matrix of the responses in multinomial format.
`x`	if requested, the model matrix.
`xx1`, `xx2`, `xx12`	vectors, for internal use.
`ynames`	vector with the names of the responses.
`tol`	the accuracy reached at the convergence.
`llp`	the penalized log-likelihood value at the convergence.
`ll`	the log-likelihood value at the convergence.
`devp`	the penalized deviance value at the convergence.
`dev`	the deviance value at the convergence.
`IM`	the estimated Information Matrix at the convergence.
`IMp`	the estimated penalized Information Matrix at the convergence. Its inverse is used to calculate the standard error of estimates.
`convergence`	logical indicating whether convergence criteria were satisfied.
`iter`	the number of iterations performed in the model fitting.
`maxB`	the number of smoothers present in the equation with the maximum number of smoothers.This also represents the number of outer iterations of the backfitting algorithm.
`ncat1`, `ncat2`	the number of levels for the two responses.
`ncat`	this is simply `ncat1*ncat2`.
`weights`	the prior weights, that is the weights initially supplied, a vector of 1s if none were.
`P`	a `p \times p` penalty matrix, where p is the number of parameters estimated (including all the intercepts), concerning the penalty term specified (if any), but excluding the additive part (if any). If no penalty terms are specified, this will be a matrix of zeros.
`gaic.m`	the penalty per parameter of the generalized AIC initially provided. It is 2 by default.
`lam1`, `lam2`, `<br /> lam3`, `lam4`	vectors of smoothing parameters initially supplied with the specification of a penalty term. If not, these are `NULL`.
`opt`	the object returned by `optim` if automatic selection of smoothing parameters has been set. `NULL` otherwise.
`etasmooth`	the matrix of predicted values for the additive part, `NULL` otherwise.
`eta`	the `n\times`(`ncat`-1) matrix of predicted values on the observed data.
`fsmooth`	a list of objects initially created by the smoothers (if any), `NULL` otherwise.
`one.smooth`	for internal use.
`df.fix`	the degrees of freedom of the parametric part of the model. Note that if a penalty term is used by specifying `penalty`, the corresponding degrees of freedoms will be counted in the parametric part of the model.
`df.fix.vect`	for internal use.
`df.smooth`	the degrees of freedom of the additive part of the model (if any), otherwise zero.
`w.res`	a `n\times`(`ncat`-1) matrix of working residuals.
`W2`	a `m`-length list of matrices of working weights.
`z`	a (`m`*`ncat`-1)-length working vector of responses.
`any.smoother`	logical indicating whether smoothers have been used in the model fit.
`Bmat`	list of bases of smoothers used in the model fit (if any), `NULL` otherwise.
`wh.eq`	list of logical vectors indicating which terms in the linear predictor are smoothers.
`wh.eq2`	list of logical vectors indicating which terms in the linear predictor are smoothers. For internal use.
`PBWB`	list of numerical matrices if smoothers are used, `NULL` otherwise. For internal use.
`BWB`	list of numerical matrices if smoothers are used, `NULL` otherwise. For internal use.
`etalist`	list of numerical vectors if smoothers are used. For internal use.
`spec.name`	list of numerical vectors if smoothers are used. For internal use.
`beta.smooth`	list of estimated coefficients for the smoothers (if any), `NULL` otherwise.
`n.smoothers`	the number of smoothers used.
`PPmat`	list of penalty matrices for the smoothers multiplied by the smoothing parameters. `NULL` if smoothers are not used.
`pnl.type`	the type of penalty used.
`xnames`	list of vectors with the names of the covariates.
`prop.smooth`	for internal use.
`GAIC`	the value of the generalized AIC from the fitted model for the specified value of `gaic.m` in `pblm.control`.
`ta`	the underlying observed contingency table of the responses, marginally to the covariates.
`set0`	the parameter setting from `pblm.control` as specified by the user.
`fo.list`	list of formulas used.
`center`	logical indicating whether argument if the covariates were centered
`scale`	logical indicating whether argument if the covariates were scaled
`type`	the type of logit used.
`plackett`	logical indicating whether the plackett inversion formula was used.
`RC.fo`	the Row-Column formula as specified by the user.
`contrasts`	the Row-Column formula contrasts as specified by the user.
`acc2`	tolerance to be used for the estimation as specified by the user.
`maxit`	maximum number of Fisher-scoring iterations as specified by the user.
`label1`, `label2`	the names of the two response variables, respectively.
`call`	the matched call from `object`.

Warning

Please be sure that the results you get are really what you are "expecting" when using uncommon specifications of argument type, mainly those involving an s logit type and its combinations with other logits. A part from the binary case, many of those have been not checked yet in the current version of the package.

The estimation of category-dependent p-splines, as outlined in Enea et al. (2014), is not allowed (work in progress).

Note

Please note that specifying a formula with interaction terms in RC.fo corresponds to a model with association structure of the type \alpha + \beta_{r} + \gamma_{c} + \delta_{rc}. In the current version of the package, non linear interaction terms of the type \alpha + \beta_r + \gamma_c + \delta_{1r}\delta_{2c}, as considered for example in Lapp et al. (1998), are not implemented here.
Furthermore, unlikely from the Dale's paramaterization, pblm does not put a minus sign to covariates.

Author(s)

Marco Enea marco.enea@unipa.it with contribution by Mikis Stasinopoulos and Bob Rigby

References

Bustami, R., Lesaffre, E., Molenberghs, G., Loos, R., Danckaerts, M. and Vlietinck, R. 2001. Modelling bivariate ordinal responses smoothly with examples from ophtalmology and genetics. Statistics in Medicine, 20, 1825-1842.

Colombi, R. and Forcina, A. 2001. Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika, 88(4), 1007-1019.

Dale, J. R. (1986) Global Cross-Ratio Models for Bivariate, Discrete, Ordered Responses. Biometrics, 42(4), 909-917.

Enea, M. and Attanasio, M. 2015. A model for bivariate data with application to the analysis of university students success. Journal of Applied Statistics, http://dx.doi.org/10.1080/02664763.2014.998407

Enea, M. and Lovison, G. 2019. A penalized approach for the bivariate logistic regression model with applications to social and medical data. Statistical Modelling, 19(5), 467-500.

Enea, M., Stasinopoulos, M., Rigby, R., and Plaia, A. 2014. The pblm package: semiparametric regression for bivariate categorical responses in R. In Thomas Kneib, Fabian Sobotka, Jan Fahrenholz, Henriette Irmer (Eds.) Proceeding of the 29th International Workshop of Statistical Modelling, Volume 2, 47-50.

Glonek, G. F. V. and McCullagh, P. (1995) Multivariate logistic models. Journal of the Royal Statistical Society, Series B, 57, 533-546.

Goodman, L. A. (1979). Simple models for the analysis of cross-classications having ordered categories. Journal of the American Statistical Association, 74(367), 537-552.

Lapp, K., Molenberghs, G., and Lesaffre, E. (1998) Models for the association between ordinal variables. Computational Statistics and Data Analysis, 27, 387-411.

Examples


#NOT RUN 
# an artificial example: 
set.seed(123)
da <- expand.grid("Y1"=1:3,"Y2"=1:3,"fat1"=0:4,"fat2"=0:1)
da$Freq <- sample(0:20,3*3*5*2,replace=TRUE)
da$x1 <- rnorm(90)

#the bivariate additive proportional-odds model
m2 <- pblm(fo1=cbind(Y1,Y2) ~ fat1 + pb(x1), data=da, weights=Freq)
plot(m2)

pblm documentation built on June 19, 2025, 5:08 p.m.