ordFacReg: Compute least squares or logistic regression for ordered...
In OrdFacReg: Least Squares, Logistic, and Cox-Regression with Ordered Predictors

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/ordFacReg.r

This function computes estimates in least squares or logistic regression where coefficients corresponding to dummy variables of ordered factors are estimated to be in non-decreasing order and at least 0. An active set algorithm as described in Duembgen et al. (2007) is used.

1 2	ordFacReg(D, Z, fact, ordfact, ordering = NA, type = c("LS", "logreg"), intercept = TRUE, display = 0, eps = 0)

`D`	Response vector, either in R^n (least squares) or in \{0, 1\}^n (logistic).
`Z`	Matrix of predictors. Factors are coded with levels from 1 to j.
`fact`	Specify columns in Z that correspond to unordered factors.
`ordfact`	Specify columns in Z that correspond to ordered factors.
`ordering`	Vector of the same length as `ordfact`. Specifies ordering of ordered factors: `"i"` means that the coefficients of the corresponding ordered factor are estimated in non-decreasing order and `"d"` means non-increasing order. See the examples below for details.
`type`	Specify type of response variable.
`intercept`	If `TRUE`, an intercept (= column of all 1's) is added to the design matrix.
`display`	If `display == 1` progress of the algorithm is output.
`eps`	Quantity to which the criterion in the Basic Procedure 2 in Duembgen et al. (2007) is compared.

For a detailed description of the problem and the algorithm we refer to Rufibach (2010).

`L`	Value of the criterion function at the maximum.
`beta`	Computed regression coefficients.
`A`	Set A of active constraints.
`design.matrix`	Design matrix that was generated.

Kaspar Rufibach (maintainer)
kaspar.rufibach@gmail.com
http://www.kasparrufibach.ch

Duembgen, L., Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at http://arxiv.org/abs/0707.4643.

Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.

ordFacRegCox computes estimates for Cox-regression.

## ========================================================
## To illustrate least squares estimation, we generate the same data
## that was used in Rufibach (2010), Table 1.
## ========================================================

## --------------------------------------------------------
## initialization
## --------------------------------------------------------
n <- 200
Z <- NULL
intercept <- FALSE

## --------------------------------------------------------
## quantitative variables
## --------------------------------------------------------
n.q <- 3
set.seed(14012009)
if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, mean = 1, sd = 2))}}

## --------------------------------------------------------
## unordered factors
## --------------------------------------------------------
un.levels <- 3
for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:un.levels[i], 
    each = ceiling(n / un.levels)))[1:n])}
fact <- n.q + 1:length(un.levels)

## --------------------------------------------------------
## ordered factors
## --------------------------------------------------------
levels <- 8
for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:levels[i], 
    each = ceiling(n / levels)))[1:n])}
ordfact <- n.q + length(un.levels) + 1:length(levels)

## --------------------------------------------------------
## generate data matrices
## --------------------------------------------------------
Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y

## --------------------------------------------------------
## generate response
## --------------------------------------------------------
D <- apply(Y * matrix(c(rep(c(2, -3, 0), each = n), rep(c(1, 1), each = n), 
    rep(c(0, 2, 2, 2, 2, 5, 5), each = n)), ncol = ncol(Y)), 1, sum) + 
    rnorm(n, mean = 0, sd = 4)

## --------------------------------------------------------
## compute estimates
## --------------------------------------------------------
res1 <- lmLSE(D, Y)
res2 <- ordFacReg(D, Z, fact, ordfact, ordering = "i", type = "LS", intercept, 
    display = 1, eps = 0)
b1 <- res1$beta
g1 <- lmSS(b1, D, Y)$dL
b2 <- res2$beta
g2 <- lmSS(b2, D, Y)$dL
Ls <- c(lmSS(b1, D, Y)$L, lmSS(b2, D, Y)$L)
names(Ls) <- c("LSE", "ordFact") 
disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), 
    round(cbind(b2, g2, cumsum(g2)), 4))

## --------------------------------------------------------
## display results
## --------------------------------------------------------
disp
Ls

## ========================================================
## Artificial data is used to illustrate logistic regression.
## ========================================================

## --------------------------------------------------------
## initialization
## --------------------------------------------------------
set.seed(1977)
n <- 500
Z <- NULL
intercept <- FALSE

## --------------------------------------------------------
## quantitative variables
## --------------------------------------------------------
n.q <- 2
if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, rgamma(2, 2, 1)))}}

## --------------------------------------------------------
## unordered factors
## --------------------------------------------------------
un.levels <- c(8, 2)
for (i in 1:length(un.levels)){Z <- cbind(Z, sample(round(runif(n, 0, 
    un.levels[i] - 1)) + 1))}
fact <- n.q + 1:length(un.levels)

## --------------------------------------------------------
## ordered factors
## --------------------------------------------------------
levels <- c(2, 4, 10)
for (i in 1:length(levels)){Z <- cbind(Z, sample(round(runif(n, 0, 
    levels[i] - 1)) + 1))}
ordfact <- n.q + length(un.levels) + 1:length(levels)

## --------------------------------------------------------
## generate response
## --------------------------------------------------------
D <- sample(c(rep(0, n / 2), rep(1, n/2)))

## --------------------------------------------------------
## generate design matrix
## --------------------------------------------------------
Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y

## --------------------------------------------------------
## compute estimates
## --------------------------------------------------------
res1 <- matrix(glm.fit(Y, D, family = binomial(link = logit))$coefficients, ncol = 1)
res2 <- ordFacReg(D, Z, fact, ordfact, ordering = NA, type = "logreg", 
    intercept = intercept, display = 1, eps = 0)
b1 <- res1
g1 <- logRegDeriv(b1, D, Y)$dL
b2 <- res2$beta
g2 <- logRegDeriv(b2, D, Y)$dL
Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L))
names(Ls) <- c("MLE", "ordFact") 
disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), 
    round(cbind(b2, g2, cumsum(g2)), 4))

## --------------------------------------------------------
## display results
## --------------------------------------------------------
disp
Ls

## --------------------------------------------------------
## compute estimates when the third ordered factor should
## have *decreasing* estimated coefficients
## --------------------------------------------------------
res3 <- ordFacReg(D, Z, fact, ordfact, ordering = c("i", "i", "d"), 
    type = "logreg", intercept = intercept, display = 1, eps = 0)
b3 <- res3$beta
g3 <- logRegDeriv(b3, D, Y)$dL
Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L, res3$L))
names(Ls) <- c("MLE", "ordFact ddd", "ordFact iid") 
disp <- cbind(1:length(b1), round(cbind(b1, b2, b3), 4))

## --------------------------------------------------------
## display results
## --------------------------------------------------------
disp
Ls