restriktor: Estimating linear regression models with (in)equality...

View source: R/restriktor.R

restriktorR Documentation

Estimating linear regression models with (in)equality restrictions

Description

Function restriktor estimates the parameters of an univariate and a multivariate linear model (lm), a robust estimation of the linear model (rlm) and a generalized linear model (glm) subject to linear equality and linear inequality restrictions. It is a convenience function. The real work horses are the conLM, conMLM, conRLM and the conGLM functions.

Usage

restriktor(object, constraints = NULL, ...)

## S3 method for class 'lm'
conLM(object, constraints = NULL, se = "standard", 
      B = 999, rhs = NULL, neq = 0L, mix.weights = "pmvnorm", 
      mix.bootstrap = 99999L, parallel = "no", ncpus = 1L, 
      cl = NULL, seed = NULL, control = list(), 
      verbose = FALSE, debug = FALSE, ...)
      
## S3 method for class 'rlm'
conRLM(object, constraints = NULL, se = "standard", 
       B = 999, rhs = NULL, neq = 0L, mix.weights = "pmvnorm", 
       mix.bootstrap = 99999L, parallel = "no", ncpus = 1L, 
       cl = NULL, seed = NULL, control = list(), 
       verbose = FALSE, debug = FALSE, ...)
       
## S3 method for class 'glm'
conGLM(object, constraints = NULL, se = "standard", 
       B = 999, rhs = NULL, neq = 0L, mix.weights = "pmvnorm", 
       mix.bootstrap = 99999L, parallel = "no", ncpus = 1L, 
       cl = NULL, seed = NULL, control = list(), 
       verbose = FALSE, debug = FALSE, ...)

## S3 method for class 'mlm'
conMLM(object, constraints = NULL, se = "none", 
       B = 999, rhs = NULL, neq = 0L, mix.weights = "pmvnorm", 
       mix.bootstrap = 99999L, parallel = "no", ncpus = 1L, 
       cl = NULL, seed = NULL, control = list(), 
       verbose = FALSE, debug = FALSE, ...)

Arguments

object

a fitted linear model object of class "lm", "mlm", "rlm" or "glm". For class "rlm" only the loss function bisquare is supported for now, otherwise the function gives an error.

constraints

there are two ways to constrain parameters. First, the constraint syntax consists of one or more text-based descriptions, where the syntax can be specified as a literal string enclosed by single quotes. Only the names of coef(model) can be used as names. See details for more information. Note that objects of class "mlm" do not (yet) support this method.

Second, the constraint syntax consists of a matrix R (or a vector in case of one constraint) and defines the left-hand side of the constraint R\theta \ge rhs, where each row represents one constraint. The number of columns needs to correspond to the number of parameters estimated (\theta) by model. The rows should be linear independent, otherwise the function gives an error. For more information about constructing the matrix R and rhs see details.

se

if "standard" (default), conventional standard errors are computed based on inverting the observed augmented information matrix. If "const", homoskedastic standard errors are computed. If "HC0" or just "HC", heteroskedastic robust standard errors are computed (a.k.a Huber White). The options "HC1", "HC2", "HC3", "HC4", "HC4m", and "HC5" are refinements of "HC0". For more details about heteroskedastic robust standard errors see the sandwich package. If "boot.standard", bootstrapped standard errors are computed using standard bootstrapping. If "boot.model.based" or "boot.residual", bootstrapped standard errors are computed using model-based bootstrapping. If "none", no standard errors are computed. Note that for objects of class "mlm" no standard errors are available (yet).

B

integer; number of bootstrap draws for se. The default value is set to 999. Parallel support is available.

rhs

vector on the right-hand side of the constraints; R\theta \ge rhs. The length of this vector equals the number of rows of the constraints matrix R and consists of zeros by default. Note: only used if constraints input is a matrix or vector.

neq

integer (default = 0) treating the number of constraints rows as equality constraints instead of inequality constraints. For example, if neq = 2, this means that the first two rows of the constraints matrix R are treated as equality constraints. Note: only used if constraints input is a matrix or vector.

mix.weights

if "pmvnorm" (default), the chi-bar-square weights are computed based on the multivariate normal distribution function with additional Monte Carlo steps. If "boot", the chi-bar-square weights are computed using parametric bootstrapping. If "none", no chi-bar-square weights are computed. The weights are necessary in the restriktor.summary function for computing the GORIC. Moreover, the weights are re-used in the iht function for computing the p-value for the test-statistic, unless the p-value is computed directly via bootstrapping.

mix.bootstrap

integer; number of bootstrap draws for mix.weights = "boot". The default value is set to 99999. Parallel support is available.

parallel

the type of parallel operation to be used (if any). If missing, the default is set "no".

ncpus

integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.

cl

an optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the restriktor call.

seed

seed value.

control

a list of control arguments:

  • absval tolerance criterion for convergence (default = sqrt(.Machine$double.eps)).

  • maxit the maximum number of iterations for the optimizer (default = 10000).

  • tol numerical tolerance value. Estimates smaller than tol are set to 0.

verbose

logical; if TRUE, information is shown at each bootstrap draw.

debug

if TRUE, debugging information about the constraints is printed out.

...

no additional arguments for now.

Details

The constraint syntax can be specified in two ways. First as a literal string enclosed by single quotes as shown below:

myConstraints <- '
    # 1. inequality constraints
      x1 > 0
      x1 < x2
    # or
      0 < x1 < x2
    
    ! 2. equality constraints  
      x3 == x4; x4 == x5 
    # or 
      x3 = x4; x4 = x5 
    # or
      x3 = x4 = x5'
  

The variable names x1 to x5 refer to the corresponding regression coefficient. Thus, constraints are impose on regression coefficients and not on the data.

Second, the above constraints syntax can also be written in matrix/vector notation as:

(The first column refers to the intercept, the remaining five columns refer to the regression coefficients x1 to x5.)

myConstraints <-
    rbind(c(0, 0, 0,-1, 1, 0), #equality constraint x3 = x4
          c(0, 0, 0, 0,-1, 1), #equality constraint x4 = x5
          c(0, 1, 0, 0, 0, 0), #inequality constraint x1 > rhs
          c(0,-1, 1, 0, 0, 0)) #inequality constraint x1 < x2
  
# the length of rhs is equal to the number of myConstraints rows.      
myRhs <- c(0,0,0,0) 
    
# the first two rows should be considered as equality constraints
myNeq <- 2  

Blank lines and comments can be used in between the constraints, and constraints can be split over multiple lines. Both the hashtag (#) and the exclamation (!) characters can be used to start a comment. Multiple constraints can be placed on a single line if they are separated by a semicolon (;), a comma (,) or the "&" sign.

In addition compound constraints can be stated via one or more longer equality or inequality sentences e.g., 'x1 > x2 > x3; x3 < 4 < x4' or 'x1 == x2 == x3 & x4 = 1'. Alternatively, the constrains can be specifies as '(x1, x2) > (x3, x4)' which is equivalent to 'x1 > x3; x1 > x4; x2 > x3; x2 > x4'.

There can be three types of text-based descriptions in the constraints syntax:

  1. Equality constraints: The "==" or "=" operator can be used to define equality constraints (e.g., x1 = 1 or x1 = x2).

  2. Inequality constraints: The "<" or ">" operator can be used to define inequality constraints (e.g., x1 > 1 or x1 < x2).

  3. Newly defined parameters: The ":=" operator can be used to define new parameters, which take on values that are an arbitrary function of the original model parameters. The function must be specified in terms of the parameter names in coef(model) (e.g., new := x1 + 2*x2). By default, the standard errors for these defined parameters are computed by using the so-called Delta method.

Variable names of interaction effects in objects of class lm, rlm and glm contain a semi-colon (:) between the variables. To impose constraints on parameters of interaction effects, the semi-colon must be replaced by a dot (.) (e.g., x3:x4 becomes x3.x4). In addition, the intercept variable names is shown as "(Intercept)". To impose restrictions on the intercept both parentheses must be replaced by a dot ".Intercept." (e.g.,.Intercept. > 10). Note: in most practical situations we do not impose restrictions on the intercept because we do not have prior knowledge about the intercept. Moreover, the sign of the intercept can be changed arbitrarily by shifting the response variable y.

Each element can be modified using arithmetic operators. For example, if x2 is expected to be twice as large as x1, then "2*x2 = x1".

If constraints = NULL, the unrestricted model is fitted.

### Note on not full row-rank ###

If the restriction matrix is not of full row-rank, this means one of the following:

  • There is at least one redundant restriction. Then, either

    • [a] Leave the redundant one out

    • [b] Use another (more time-consuming) way of obtaining the level probabilities for the penalty term (goric function does this by default): Bootstrapping, as discussed above.

  • There is at least one range restriction (e.g., -2 < group1 < 2). Such a restriction can be evaluated but there is a sensitivity (of a scaling factor in the covariance matrix, like with a prior in a Bayes factor) which currently cannot be checked for.

  • There is at least one conflicting restriction (e.g., 2 < group1 < -2).

Such a restriction can evidently never hold and is thus impossible to evaluate. To prevent this type of error delete the one that is incorrect.

Value

An object of class restriktor, for which a print and a summary method are available. More specifically, it is a list with the following items:

CON

a list with useful information about the restrictions.

call

the matched call.

timing

how much time several tasks take.

parTable

a parameter table with information about the observed variables in the model and the imposed restrictions.

b.unrestr

unrestricted regression coefficients.

b.restr

restricted regression coefficients.

residuals

restricted residuals.

wresid

a working residual, weighted for "inv.var" weights only (rlm only)

fitted

restricted fitted mean values.

weights

(only for weighted fits) the specified weights.

wgt

the weights used in the IWLS process (rlm only).

scale

the robust scale estimate used (rlm only).

stddev

a scale estimate used for the standard errors.

R2.org

unrestricted R-squared.

R2.reduced

restricted R-squared.

df.residual

the residual degrees of freedom

s2.unrestr

mean squared error of unrestricted model.

s2.restr

mean squared error of restricted model.

loglik

restricted log-likelihood.

Sigma

variance-covariance matrix of unrestricted model.

constraints

matrix with restrictions.

rhs

vector of right-hand side elements.

neq

number of equality restrictions.

wt.bar

chi-bar-square mixing weights or a.k.a. level probabilities.

iact

active restrictions.

converged

did the IWLS converge (rlm only)?

iter

number of iteration needed for convergence (rlm only).

bootout

object of class boot. Only available if bootstrapped standard errors are requested, else bootout = NULL.

control

list with control options.

model.org

original model.

se

as input. This information is needed in the summary function.

information

observed information matrix with the inverted information matrix and the augmented information matrix as attributes.

Author(s)

Leonard Vanbrabant and Yves Rosseel

References

Schoenberg, R. (1997). Constrained Maximum Likelihood. Computational Economics, 10, 251–266.

Shapiro, A. (1988). Towards a unified theory of inequality-constrained testing in multivariate analysis. International Statistical Review 56, 49–62.

Silvapulle, M.J. and Sen, P.K. (2005). Constrained Statistical Inference. Wiley, New York

See Also

iht, goric

Examples

## lm
## unrestricted linear model for ages (in months) at which an 
## infant starts to walk alone.

# prepare data
DATA1 <- subset(ZelazoKolb1972, Group != "Control")

# fit unrestricted linear model
fit1.lm <- lm(Age ~ -1 + Group, data = DATA1)

# the variable names can be used to impose restrictions on
# the corresponding regression parameters.
coef(fit1.lm)

# restricted linear model with restrictions that the walking 
# exercises would not have a negative effect of increasing the 
# mean age at which a child starts to walk. 
fit1.con <- restriktor(fit1.lm, constraints = ' GroupActive  < GroupPassive < GroupNo ')
summary(fit1.con)

# Or in matrix notation.
myConstraints1 <- rbind(c(-1, 1, 0),
                        c( 0,-1, 1))
myRhs1 <- rep(0L, nrow(myConstraints1)) 
myNeq1 <- 0

fit1.con <- restriktor(fit1.lm, constraints = myConstraints1,
                       rhs = myRhs1, neq = myNeq1)
summary(fit1.con)

#########################
## Artificial examples ##
#########################
library(MASS)

## mlm
# generate data
n <- 30
mu <- rep(0, 4)
Sigma <- matrix(5,4,4)
  diag(Sigma) <- c(10,10,10,10)
# 4 Y's.
Y <- mvrnorm(n, mu, Sigma)

# fit unrestricted multivariate linear model
fit.mlm <- lm(Y ~ 1)

# constraints
myConstraints2 <- diag(0,4)
  diag(myConstraints2) <- 1

# fit restricted multivariate linear model
fit2.con <- restriktor(fit.mlm, constraints = myConstraints2)

summary(fit2.con)


## rlm
# generate data
n <- 10
means <- c(1,2,1,3)
nm <- length(means)
group <- as.factor(rep(1:nm, each = n))
y <- rnorm(n * nm, rep(means, each = n))
DATA2 <- data.frame(y, group)

# fit unrestricted robust linear model
fit3.rlm <- rlm(y ~ -1 + group, data = DATA2, method = "MM")
coef(fit3.rlm)

## increasing means
myConstraints3 <- ' group1 < group2 < group3 < group4 '

# fit restricted robust linear model and compute 
# Huber-White (robust) standard errors.
fit3.con <- restriktor(fit3.rlm, constraints = myConstraints3, 
                       se = "HC0")
summary(fit3.con)


## increasing means in matrix notation.
myConstraints3 <- rbind(c(-1, 1, 0, 0),
                        c( 0,-1, 1, 0),
                        c( 0, 0,-1, 1))
myRhs3 <- rep(0L, nrow(myConstraints3)) 
myNeq3 <- 0

fit3.con <- restriktor(fit3.rlm, constraints = myConstraints3,
                       rhs = myRhs3, neq = myNeq3, se = "HC0")
summary(fit3.con)

## equality restrictions only.
myConstraints4 <- ' group1 = group2 = group3 = group4 '
                    
fit4.con <- restriktor(fit3.rlm, constraints = myConstraints4)
summary(fit4.con)


## combination of equality and inequality restrictions.
myConstraints5 <- ' group1  = group2
                    group3  < group4 '

# fit restricted model and compute model-based bootstrapped 
# standard errors. We only generate 9 bootstrap samples in this 
# example; in practice you may wish to use a much higher number.
fit5.con <- restriktor(fit3.rlm, constraints = myConstraints4, 
                       se = "boot.model.based", B = 9)
# an error is probably thrown, due to a too low number of bootstrap draws.
summary(fit5.con)

# restriktor can also be used to define effects using the := operator 
# and impose restrictions on them. For example, compute the average 
# effect (AVE) and impose the restriction AVE > 0.
# generate data
n <- 30
b0 <- 10; b1 = 0.5; b2 = 1; b3 = 1.5
X <- c(rep(c(0), n/2), rep(c(1), n/2))
set.seed(90) 
Z <- rnorm(n, 16, 5)
y <- b0 + b1*X + b2*Z + b3*X*Z + rnorm(n, 0, sd = 10) 
DATA3 = data.frame(cbind(y, X, Z))

# fit linear model with interaction
fit6.lm <- lm(y ~ X*Z, data = DATA3)

fit6.con <- restriktor(fit6.lm, constraints = ' AVE := X + 16.86137*X.Z; 
                                                AVE > 0 ')
summary(fit6.con)

restriktor documentation built on Oct. 4, 2023, 9:13 a.m.