multord: Multivariate Ordinal Regression Models.
In MultOrd: Multivariate Ordinal Regression Models

Description Usage Arguments Details Value See Also Examples

multord is used to multivariate ordinal regression models. Different model types are implemented and can be chosen by the use of different error.structures. Constraints on the threshold as well as on the regression parameters can be imposed.

multord(formula, data, index = NULL, response.levels = NULL,
  response.names = NULL, link = c("probit", "logit"),
  error.structure = corGeneral(~1), weights = NULL,
  coef.constraints = NULL, coef.values = NULL,
  threshold.constraints = NULL, threshold.values = NULL, se = TRUE,
  start.values = NULL, solver = "newuoa")

`formula`	an object of class `formula` of the form `y ~ X1 + ... + Xp`.
`data`	`data.frame` containing a subject index, an index for the repeated measurements, a response `y` and covariates `X1, ..., Xp`.
`index`	(optional) argument to specify the column names of the subject index and the repeated measurement index by a vector `c("subject", "repeated measurement")` in `data`. The default value of `index` is `NULL` assuming that the first column of `data` contains the subject index and the second column the repeated measurement index.
`response.levels`	(optional) `list` of length `J` to specify the category labels in case of varying categories across repeated measurements
`response.names`	(optional) `vector` of the labels of the repeated measurement index in order to specify ordering of the responses which is essential when setting constraints on the model parameters. The default value of `response.names` is `NULL` giving the natural ordering of the levels of the factor variable of repeated measurements.
`link`	specifies the link function by `"probit"` (multivariate normal distribution) or `"logit"` (multivariate logistic distribution).
`error.structure`	different `error.structures`: general correlation structure (default) `corGeneral(~1)`, general covariance structure `covGeneral(~1)`, factor dependent correlation structure `covGeneral(~f)`, factor dependent covariance structure `covGeneral(~f)`, covariate dependent equicorrelation structure `corEqui(~X)`, AR(1) correlation structure `corAR1(~1)` or factor dependent AR(1) correlation structure `corAR1(~f)`. See `error.structures` or 'Details'.
`weights`	(optional) column name of subject-specific weights in `data` which need to be constant across repeated measurements. Negative weights are not allowed.
`coef.constraints`	`vector` or `matrix` of constraints on the regression coefficients. See 'Details'.
`coef.values`	`matrix` setting fixed values on the regression coefficients. See 'Details'.
`threshold.constraints`	`vector` of constraints on the threshold parameters. See 'Details'.
`threshold.values`	`list` of (optional) fixed values for the threshold parameters. See 'Details'.
`se`	logical, if `TRUE` standard errors are computed.
`start.values`	vector of (optional) starting values.
`solver`	choose applicable solver of `optimx` (default is `newuoa`).

data

We use the long format for the input of data, where each row contains a subject index i (firmID), a repeated measurement index j (raterID), an ordinal response (rating) and all the covariates (X1, ..., X6). This long format data stucture is internally transformed to matrix of covariates Y and a list of covariate matrices X_j for all j \in J by a matching according to the subject index i and the repeated measurement index j, which are passed by an optional argument index. This is usually performed by a character vector of length two specifying the column names of the subject index and the repeated measurement index in data.

index = c("firmID", "raterID")

The default value of index is NULL assuming that the first column of data contains the subject index and the second column the repeated measurement index. In order to avoid numerical instabilities we suggest to standardize the covariates x_{ij}.

If specific constraints are imposed on the parameter set, a well defined index j \in J for the repeated measurements is needed. Therefore, a vector response.names is used to define the index number of the repeated measurement.

response.names = c("rater1", "rater2", "rater3", "defaultInd")

The default value of response.names is NULL giving the natural ordering of the levels of the factor variable for all the repeated measurements. The ordering of response.names always specifies the index of the repeated measurement unit j \in J. This ordering is essential when putting constraints on the parameters and when setting response.levels.

response.levels = list(c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                     c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                     c("D","C","B","Ba", "Baa", "A", "Aa", "Aaa"),
                                     c("default", "no default"))

If the categories differ across repeated measurements (either the number of categories or the category labels) one needs to specify the response.levels explicitly. This is performed by a list of length J (number of repeated measurements), where each element contains the names of the levels of the ordered categories in ascending or descending order.

formula

The ordinal responses Y (rating) are passed by a formula object. Intercepts can be included or excluded in the model depending on the model paramterization:

Model without intercept:

If the intercept should be removed the formula for a given response (rating) and covariates (X1 to Xp) has the following form:

formula = rating ~ 0 + X1 + ... + Xp.
Model with intercept:

If one wants to include an intercept in the model, there are two equivalent possibilities to set the model formula. Either one inludes the intercept explicitly by:

formula = rating ~ 1 + X1 + ... + Xp,

or by

formula = rating ~ X1 + ... + Xp.

error.structure

We allow for several different error structures depending on the model parameterization:

Correlation:
- corGeneral The most common parameterization is the general correlation matrix.
  
  error.structure <- corGeneral(~ 1)
  
  This paramterization can be extended by allowing a factor dependent correlation structure, where the correlation of each subject i depends on a given factor f. This factor f is not allowed to vary across repeated measurements j for the same subject i and due to numerical constraints only up to maximum 30 levels are allowed.
  
  error.structure = corGeneral(~ f)
- corEqui A covariate dependent equicorrelation structure, where the correlations are equal across all J dimensions and depend on some covariates X1, ..., Xp. It has to be noted that these covariates X1, ..., Xp as well as the factor f are not allowed to vary across repeated measurements j for the same subject i.
  
  error.structure <- corEqui(~ X1 + ... + Xp)
- corAR1 In order to account for some heterogeneity the AR(1) error structure is allowed to depend on covariates X1, ..., Xp that are constant over time for each subject i.
  
  error.structure = corAR1(~ X1 + ... + Xp)
Covariance:
- covGeneral
  
  In case of a full variance-covariance parameterization the standard parameterization with a full variance-covariance is obtained by:
  
  error.structure = covGeneral(~ 1)
  
  This parameterization can be extended to the factor dependent covariance structure, where the covariance of each subject depends on a given factor f:
  
  error.structure = covGeneral(~ f)

coef.constraints

In order to achieve a more flexible framework we allow for constraints on the regression coefficients. These constraints can be specified by a vector or a matrix coef.constraints. First, a simple and less flexible way by specifying a vector coef.constraints of dimension J. The ordering j \in J of the responses is given by response.names. This vector is allocated in the following way: The first element of the vector coef.constraints gets a value of 1. If the coefficients of the repeated measurement j = 2 should be equal to the coefficients of the first dimension (j=1) again a value of 1 is set. If the coefficients should be different to the coefficients of the first dimension a value of 2 is set. In analogy, if the coefficients of dimensions two and three should be the same one sets both values to 2 and if they should be different, a value of 3 is set. Constraints on the regression coefficients of the remaining repeated measurements are set analogously.

coef.constraints <- c(1,1,2,3)

This vector coef.constraints sets the coefficients of the first two raters equal

β_{1\cdot} = β_{2\cdot}

A more advanced way to specify constraints on the regression coefficients is a matrix with J rows, where each column specifies constraints on one of the p coefficients in the same way as above. In addition, a value of NA excludes a corresponding coefficient.

coef.constraints <- cbind(c(1,2,3,4), c(1,1,1,2), c(NA,NA,NA,1),
                                            c(1,1,1,NA),c(1,2,3,4),c(1,2,3,4))

This matrix coef.constraints gives the following constraints:

β_{12} = β_{22} = β_{32}
β_{13} = 0
β_{23} = 0
β_{33} = 0
β_{44} = 0
β_{14} = β_{24} = β_{34}

coef.values

In addition, specific values on regression coefficients can be set in the matrix coef.values. Parameters are removed if the value is set to zero (default for NA's in coef.constraints) or to some fixed value. If constraints on parameters are set, these dimensions need to have the same value in coef.values. Again each column corresponds to one regression coefficient.

In the credit risk example, the constraints given by coef.constraints and coef.values

coef.constraints <- cbind(c(1,2,3,4), c(1,1,1,2), c(NA,NA,NA,1),
                                           c(1,1,1,NA), c(1,2,3,4), c(1,2,3,4))

coef.values <- cbind(c(NA,NA,NA,NA), c(2,2,2,NA), c(0,0,0,NA),
                                    c(NA,NA,NA,0), c(NA,NA,NA,NA), c(NA,NA,NA,NA))

result in the following model:

\widetilde Y_{i1} = β_{11} x_{i1} + 2 x_{i2} + x_{i4} + β_{15} x_{i5} + β_{16} x_{i6},

\widetilde Y_{i2} = β_{21} x_{i1} + 2 x_{i2} + x_{i4} + β_{25} x_{i5} + β_{26} x_{i6},

\widetilde Y_{i3} = β_{31} x_{i1} + 2 x_{i2} + x_{i4} + β_{35} x_{i5} + β_{36} x_{i6},

\widetilde Y_{i4} = β_{41} x_{i1} + β_{42} x_{i2} + β_{43} x_{i3} + β_{45} x_{i5} + β_{46} x_{i6}.

Interaction terms

When constraints on the regression coefficient should be specified in models with interaction terms, the coef.constraints matrix has to be expanded manually. In case of interaction terms (specified either by X1 + X2 + X1:X2 or equivalently by X1*X2), one additional column at the end of coef.constraints for the interaction term has to be specified for numerical variables. For interaction terms including factor variables suitably more columns have to be added to the coef.constraints matrix.

threshold.constraints

Similarly, constraints on the threshold parameters can be imposed by a vector of positive integers, where dimensions with equal threshold parameters get the same integer. When restricting two outcome dimensions to be the same, one has to be careful that the number of categories in the two outcome dimensions must be the same. In our example with J=4 different outcomes we impose:

threshold.constraints <- c(1,2,1,3)

gives the following restrictions:

\bmθ_{1} = \bmθ_{3}
\bmθ_{2},\, \bmθ_{4} arbitrary.

threshold.values

In addition, threshold parameter values can be specified by threshold.values in accordance with identifiability constraints. For this purpose we use a list with J elements, where each element specifies the constraints of the particular dimension by a vector of length of the number of threshold parameters (number of categories - 1). A number specifies a threshold parameter to a specific value and NA leaves the parameter flexible. In the credit risk example we have threshold.constraints <- NULL

threshold.values <- list(c(-4,NA,NA,NA,NA,4.5),
                                       c(-4,NA,NA,NA,NA,4),
                                       c(-5,NA,NA,NA,NA,NA,4.5),
                                       c(-2.5))

θ_{11} = -4 ≤q θ_{12} ≤q θ_{13}≤q θ_{14}≤q θ_{15} ≤q θ_{16} = 4.5,
θ_{21} = -4 ≤q θ_{22} ≤q θ_{23}≤q θ_{24}≤q θ_{25} ≤q θ_{26} = 4,
θ_{31} = -5 ≤q θ_{32} ≤q θ_{33}≤q θ_{34}≤q θ_{35} ≤q θ_{36} ≤q θ_{37} = 4.5,
θ_{41} = -2.5.

The function multord returns an object of class "multord".

The functions summary and print are used to display the results. The function coef extracts the regression coefficients, a function threshold the threshold coefficients and the function sigma represents the estimation of the corresponding error structure.

An object of class "multord" is a list containing the following components:

beta

a named matrix of regression coefficients
theta

a named list of threshold parameters

sigmas

a named list of correlation (covariance) matrices, or a ?vector of coefficients in the
error.structure = corEqui setting
sebeta

a named matrix of the standard errors of the regression coefficients
setheta

a named list of the standard errors of the threshold parameters
sesigmas

a named list of the standard errors of the correlation (covariance) matrices, or a ?vector of coefficients in the error.structure = corEqui setting
rho

a list of all objects that are used in multord
rho$optRes

a vector of the optimizer output

predict.multord, print.multord, summary.multord, coef.multord, thresholds.multord, get.error.struct.multord data_multord,data_CMOR, data_multord_panel

library(MultOrd)

#load data
data("data_multord")
head(data_multord)

#-------------
# corGeneral
#-------------
#choose rater 1,2,4
#set b13 = 0, bj4 = 1 for all j
#set thresholds the same for rater 1 and 2
#different number of tresholds and labelling --> need to specify response.levels
## Not run: 
res_complex <- multord(formula = rating ~ 0 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
               index = c("firmID", "rater"),
#not necessary if firmID is first column and rater is second column in data
               data = data_multord, #choose data
               response.levels = list(c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                      c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                      c("D","C","B","Ba", "Baa", "A", "Aa", "Aaa")),
#list for each rater;
#need to be set if specific levels/labels are desired (not in natural ordering)
               response.names = c("rater1", "rater2", "rater3"),
# set if not all raters are used and specifies ordering
               link = "probit", #probit or logit
               error.structure = corGeneral(~1), #different error structures
               weights = NULL,
               coef.constraints = cbind(c(1,2,2),
                                        c(1,1,2),
                                        c(NA,1,2),
                                        c(NA,NA,NA),
                                        c(1,1,2)),#either a vector or a matrix
               coef.values = cbind(c(NA,NA,NA),
                                   c(NA,NA,NA),
                                   c(0,NA,NA),
                                   c(1,1,1),
                                   c(NA,NA,NA)),
#matrix (possible if coef.constraints is a matrix)
               threshold.constraints = c(1,1,2), #vector
               threshold.values = NULL, #list for each rater
               se = TRUE,
               start.values = NULL,
               solver = "newuoa",
               PL.lag = NULL) #only for corAR1
print(res_complex)
res_complex.summary <- summary(res_complex)
thresholds(res_complex)
coefficients(res_complex)
get.error.struct(res_complex)

#-------------
# covGeneral
#-------------
res_cov3 <- multord(formula = rating ~ 1 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
            index = c("firmID", "rater"),
#not necessary if firmID is first column and rater is second column in data
            data = data_multord, #choose data
            response.levels = list(c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                   c("C","B","BB", "BBB", "A", "AA", "AAA"),
                                   c("D","C","B","Ba", "Baa", "A", "Aa", "Aaa")),
#list for each rater;
#need to be set if specific levels/labels are desired
            response.names = c("rater1", "rater2", "rater3"),
# set if not all raters are used and specifies ordering
            link = "probit", #probit or logit
            error.structure = covGeneral(~1), #different error structures
            weights = NULL,
            coef.constraints = NULL, #either a vector or a matrix
            coef.values = NULL, #matrix (possible if coef.constraints is a matrix)
            threshold.constraints = NULL, #vector
            threshold.values = list(c(-4,NA,NA,NA,NA,4.5),
                                    c(-4,NA,NA,NA,NA,4),
                                    c(-5,NA,NA,NA,NA,NA,4.5)), #list for each rater
            se = TRUE,
            start.values = NULL,
            solver = "newuoa",
            PL.lag = NULL)
print(res_cov3)
summary(res_cov3)
thresholds(res_cov3)
coefficients(res_cov3)
get.error.struct(res_cov3)


#-------------
# corAR1
#-------------
data(data_multord_panel)
head(data_multord_panel)
mult.obs = 5
res_AR1 <- multord(formula = rating ~ 0 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
           index = c("firmID", "year"),
#not necessary if firmID is first column and rater is second column in data
           data = data_multord_panel, #choose data
           response.levels = rep(list(c("C","B","BB", "BBB", "A", "AA", "AAA")), mult.obs),
#list for each rater;
#need to be set if specific levels/labels are desired (not in natural ordering)
           response.names = c(2001:2005),
# set if not all raters are used and specifies ordering
           link = "probit", #probit or logit
           error.structure = corAR1(~1), #different error structures
           weights = NULL,
           coef.constraints = NULL,#either a vector or a matrix
           coef.values = NULL, #matrix (possible if coef.constraints is a matrix)
           threshold.constraints = NULL, #vector
           threshold.values = NULL, #list for each rater
           se = TRUE,
           start.values = NULL,
           solver = "newuoa",
           PL.lag = NULL) #only for corAR1
print(res_AR1)
res_AR1.summary <- summary(res_AR1)
thresholds(res_AR1)
coefficients(res_AR1)
get.error.struct(res_AR1)
get.error.struct(res_AR1, type = "r")

## End(Not run)