tobit: Tobit Regression

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/family.normal.R

Description

Fits a Tobit regression model.

Usage

1
2
3
4
tobit(Lower = 0, Upper = Inf, lmu = "identitylink", lsd = "loglink",
      imu = NULL, isd = NULL,
      type.fitted = c("uncensored", "censored", "mean.obs"),
      byrow.arg = FALSE, imethod = 1, zero = "sd")

Arguments

Lower

Numeric. It is the value L described below. Any value of the linear model x_i^T beta that is less than this lowerbound is assigned this value. Hence this should be the smallest possible value in the response variable. May be a vector (see below for more information).

Upper

Numeric. It is the value U described below. Any value of the linear model x_i^T beta that is greater than this upperbound is assigned this value. Hence this should be the largest possible value in the response variable. May be a vector (see below for more information).

lmu, lsd

Parameter link functions for the mean and standard deviation parameters. See Links for more choices. The standard deviation is a positive quantity, therefore a log link is its default.

imu, isd, byrow.arg

See CommonVGAMffArguments for information.

type.fitted

Type of fitted value returned. The first choice is default and is the ordinary uncensored or unbounded linear model. If "censored" then the fitted values in the interval [L, U]. If "mean.obs" then the mean of the observations is returned; this is a doubly truncated normal distribution augmented by point masses at the truncation points (see dtobit). See CommonVGAMffArguments for more information.

imethod

Initialization method. Either 1 or 2 or 3, this specifies some methods for obtaining initial values for the parameters. See CommonVGAMffArguments for information.

zero

A vector, e.g., containing the value 1 or 2. If so, the mean or standard deviation respectively are modelled as an intercept-only. Setting zero = NULL means both linear/additive predictors are modelled as functions of the explanatory variables. See CommonVGAMffArguments for more information.

Details

The Tobit model can be written

y_i^* = x_i^T beta + e_i

where the e_i ~ N(0,sigma^2) independently and i=1,...,n. However, we measure y_i = y_i^* only if y_i^* > L and y_i^* < U for some cutpoints L and U. Otherwise we let y_i=L or y_i=U, whatever is closer. The Tobit model is thus a multiple linear regression but with censored responses if it is below or above certain cutpoints.

The defaults for Lower and Upper and lmu correspond to the standard Tobit model. Fisher scoring is used for the standard and nonstandard models. By default, the mean x_i^T beta is the first linear/additive predictor, and the log of the standard deviation is the second linear/additive predictor. The Fisher information matrix for uncensored data is diagonal. The fitted values are the estimates of x_i^T beta.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

If values of the response and Lower and/or Upper are not integers then there is the danger that the value is wrongly interpreted as uncensored. For example, if the first 10 values of the response were runif(10) and Lower was assigned these value then testing y[1:10] == Lower[1:10] is numerically fraught. Currently, if any y < Lower or y > Upper then a warning is issued. The function round2 may be useful.

Note

The response can be a matrix. If so, then Lower and Upper are recycled into a matrix with the number of columns equal to the number of responses, and the recycling is done row-wise if byrow.arg = TRUE. The default order is as matrix, which is byrow.arg = FALSE. For example, these are returned in fit4@misc$Lower and fit4@misc$Upper below.

If there is no censoring then uninormal is recommended instead. Any value of the response less than Lower or greater than Upper will be assigned the value Lower and Upper respectively, and a warning will be issued. The fitted object has components censoredL and censoredU in the extra slot which specifies whether observations are censored in that direction. The function cens.normal is an alternative to tobit().

When obtaining initial values, if the algorithm would otherwise want to fit an underdetermined system of equations, then it uses the entire data set instead. This might result in rather poor quality initial values, and consequently, monitoring convergence is advised.

Author(s)

Thomas W. Yee

References

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26, 24–36.

See Also

rtobit, cens.normal, uninormal, double.cens.normal, posnormal, CommonVGAMffArguments, round2, mills.ratio, margeff, rnorm.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
# Here, fit1 is a standard Tobit model and fit2 is a nonstandard Tobit model
tdata <- data.frame(x2 = seq(-1, 1, length = (nn <- 100)))
set.seed(1)
Lower <- 1; Upper <- 4  # For the nonstandard Tobit model
tdata <- transform(tdata,
                   Lower.vec = rnorm(nn, Lower, 0.5),
                   Upper.vec = rnorm(nn, Upper, 0.5))
meanfun1 <- function(x) 0 + 2*x
meanfun2 <- function(x) 2 + 2*x
meanfun3 <- function(x) 2 + 2*x
meanfun4 <- function(x) 3 + 2*x
tdata <- transform(tdata,
  y1 = rtobit(nn, mean = meanfun1(x2)),  # Standard Tobit model
  y2 = rtobit(nn, mean = meanfun2(x2), Lower = Lower, Upper = Upper),
  y3 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec),
  y4 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec))
with(tdata, table(y1 == 0))  # How many censored values?
with(tdata, table(y2 == Lower | y2 == Upper))  # How many censored values?
with(tdata, table(attr(y2, "cenL")))
with(tdata, table(attr(y2, "cenU")))

fit1 <- vglm(y1 ~ x2, tobit, data = tdata, trace = TRUE)
coef(fit1, matrix = TRUE)
summary(fit1)

fit2 <- vglm(y2 ~ x2, tobit(Lower = Lower, Upper = Upper, type.f = "cens"),
             data = tdata, trace = TRUE)
table(fit2@extra$censoredL)
table(fit2@extra$censoredU)
coef(fit2, matrix = TRUE)

fit3 <- vglm(y3 ~ x2, tobit(Lower = with(tdata, Lower.vec),
                            Upper = with(tdata, Upper.vec), type.f = "cens"),
             data = tdata, trace = TRUE)
table(fit3@extra$censoredL)
table(fit3@extra$censoredU)
coef(fit3, matrix = TRUE)

# fit4 is fit3 but with type.fitted = "uncen".
fit4 <- vglm(cbind(y3, y4) ~ x2,
             tobit(Lower = rep(with(tdata, Lower.vec), each = 2),
                   Upper = rep(with(tdata, Upper.vec), each = 2),
                   byrow.arg = TRUE),
             data = tdata, crit = "coeff", trace = TRUE)
head(fit4@extra$censoredL)  # A matrix
head(fit4@extra$censoredU)  # A matrix
head(fit4@misc$Lower)       # A matrix
head(fit4@misc$Upper)       # A matrix
coef(fit4, matrix = TRUE)

## Not run:  # Plot fit1--fit4
par(mfrow = c(2, 2))

plot(y1 ~ x2, tdata, las = 1, main = "Standard Tobit model",
     col = as.numeric(attr(y1, "cenL")) + 3,
     pch = as.numeric(attr(y1, "cenL")) + 1)
legend(x = "topleft", leg = c("censored", "uncensored"),
       pch = c(2, 1), col = c("blue", "green"))
legend(-1.0, 2.5, c("Truth", "Estimate", "Naive"),
       col = c("purple", "orange", "black"), lwd = 2, lty = c(1, 2, 2))
lines(meanfun1(x2) ~ x2, tdata, col = "purple", lwd = 2)
lines(fitted(fit1) ~ x2, tdata, col = "orange", lwd = 2, lty = 2)
lines(fitted(lm(y1 ~ x2, tdata)) ~ x2, tdata, col = "black",
      lty = 2, lwd = 2)  # This is simplest but wrong!

plot(y2 ~ x2, data = tdata, las = 1, main = "Tobit model",
     col = as.numeric(attr(y2, "cenL")) + 3 +
           as.numeric(attr(y2, "cenU")),
     pch = as.numeric(attr(y2, "cenL")) + 1 +
           as.numeric(attr(y2, "cenU")))
legend(x = "topleft", leg = c("censored", "uncensored"),
       pch = c(2, 1), col = c("blue", "green"))
legend(-1.0, 3.5, c("Truth", "Estimate", "Naive"),
       col = c("purple", "orange", "black"), lwd = 2, lty = c(1, 2, 2))
lines(meanfun2(x2) ~ x2, tdata, col = "purple", lwd = 2)
lines(fitted(fit2) ~ x2, tdata, col = "orange", lwd = 2, lty = 2)
lines(fitted(lm(y2 ~ x2, tdata)) ~ x2, tdata, col = "black",
      lty = 2, lwd = 2)  # This is simplest but wrong!

plot(y3 ~ x2, data = tdata, las = 1,
     main = "Tobit model with nonconstant censor levels",
     col = as.numeric(attr(y3, "cenL")) + 2 +
           as.numeric(attr(y3, "cenU") * 2),
     pch = as.numeric(attr(y3, "cenL")) + 1 +
           as.numeric(attr(y3, "cenU") * 2))
legend(x = "topleft", leg = c("censoredL", "censoredU", "uncensored"),
       pch = c(2, 3, 1), col = c(3, 4, 2))
legend(-1.0, 3.5, c("Truth", "Estimate", "Naive"),
       col = c("purple", "orange", "black"), lwd = 2, lty = c(1, 2, 2))
lines(meanfun3(x2) ~ x2, tdata, col = "purple", lwd = 2)
lines(fitted(fit3) ~ x2, tdata, col = "orange", lwd = 2, lty = 2)
lines(fitted(lm(y3 ~ x2, tdata)) ~ x2, tdata, col = "black",
      lty = 2, lwd = 2)  # This is simplest but wrong!

plot(y3 ~ x2, data = tdata, las = 1,
     main = "Tobit model with nonconstant censor levels",
     col = as.numeric(attr(y3, "cenL")) + 2 +
           as.numeric(attr(y3, "cenU") * 2),
     pch = as.numeric(attr(y3, "cenL")) + 1 +
           as.numeric(attr(y3, "cenU") * 2))
legend(x = "topleft", leg = c("censoredL", "censoredU", "uncensored"),
       pch = c(2, 3, 1), col = c(3, 4, 2))
legend(-1.0, 3.5, c("Truth", "Estimate", "Naive"),
       col = c("purple", "orange", "black"), lwd = 2, lty = c(1, 2, 2))
lines(meanfun3(x2) ~ x2, data = tdata, col = "purple", lwd = 2)
lines(fitted(fit4)[, 1] ~ x2, tdata, col = "orange", lwd = 2, lty = 2)
lines(fitted(lm(y3 ~ x2, tdata)) ~ x2, data = tdata, col = "black",
      lty = 2, lwd = 2)  # This is simplest but wrong!

## End(Not run)

Example output

Loading required package: stats4
Loading required package: splines

FALSE  TRUE 
   52    48 

FALSE  TRUE 
   66    34 

FALSE  TRUE 
   75    25 

FALSE  TRUE 
   91     9 
VGLM    linear loop  1 :  loglikelihood = -94.952489
VGLM    linear loop  2 :  loglikelihood = -90.601839
VGLM    linear loop  3 :  loglikelihood = -90.335693
VGLM    linear loop  4 :  loglikelihood = -90.334971
VGLM    linear loop  5 :  loglikelihood = -90.334971
                    mu    loge(sd)
(Intercept) 0.06794616 -0.06234739
x2          2.02595920  0.00000000

Call:
vglm(formula = y1 ~ x2, family = tobit, data = tdata, trace = TRUE)


Pearson residuals:
            Min      1Q  Median       3Q   Max
mu       -1.847 -0.5428 -0.2101 0.610133 4.167
loge(sd) -1.315 -0.4387 -0.1416 0.005278 7.397

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  0.06795    0.13614   0.499    0.618    
(Intercept):2 -0.06235    0.10273  -0.607    0.544    
x2             2.02596    0.23949   8.459   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of linear predictors:  2 

Names of linear predictors: mu, loge(sd)

Log-likelihood: -90.335 on 197 degrees of freedom

Number of iterations: 5 
VGLM    linear loop  1 :  loglikelihood = -117.40947
VGLM    linear loop  2 :  loglikelihood = -115.71547
VGLM    linear loop  3 :  loglikelihood = -115.66898
VGLM    linear loop  4 :  loglikelihood = -115.66819
VGLM    linear loop  5 :  loglikelihood = -115.66818
VGLM    linear loop  6 :  loglikelihood = -115.66818

FALSE  TRUE 
   75    25 

FALSE  TRUE 
   91     9 
                  mu    loge(sd)
(Intercept) 2.053261 -0.04893087
x2          1.880404  0.00000000
VGLM    linear loop  1 :  loglikelihood = -118.89226
VGLM    linear loop  2 :  loglikelihood = -117.66367
VGLM    linear loop  3 :  loglikelihood = -117.64018
VGLM    linear loop  4 :  loglikelihood = -117.63989
VGLM    linear loop  5 :  loglikelihood = -117.63989
VGLM    linear loop  6 :  loglikelihood = -117.63989

FALSE  TRUE 
   72    28 

FALSE  TRUE 
   90    10 
                  mu   loge(sd)
(Intercept) 1.950755 0.04030512
x2          1.899920 0.00000000
VGLM    linear loop  1 :  coefficients = 
1.98932529, 0.17966757, 2.04163067, 0.14457724, 1.86269665, 1.95343639
VGLM    linear loop  2 :  coefficients = 
1.930805570, 0.046317493, 1.928522780, 0.034401748, 1.881584193, 2.117487220
VGLM    linear loop  3 :  coefficients = 
1.950122301, 0.042628821, 1.941614233, 0.012292403, 1.901356762, 2.124136778
VGLM    linear loop  4 :  coefficients = 
1.9505733461, 0.0402054718, 1.9443736841, 0.0092747952, 1.8994721032, 
2.1226673175
VGLM    linear loop  5 :  coefficients = 
1.9507624152, 0.0403461285, 1.9447777519, 0.0088741385, 1.8999900146, 
2.1223934764
VGLM    linear loop  6 :  coefficients = 
1.9507546689, 0.0403051194, 1.9448329621, 0.0088196007, 1.8999195015, 
2.1223533164
VGLM    linear loop  7 :  coefficients = 
1.9507575223, 0.0403102638, 1.9448405049, 0.0088122162, 1.8999318719, 
2.1223478316
VGLM    linear loop  8 :  coefficients = 
1.9507571917, 0.0403093347, 1.9448415282, 0.0088112114, 1.8999299692, 
2.1223470822
VGLM    linear loop  9 :  coefficients = 
1.950757254, 0.040309475, 1.944841667, 0.008811075, 1.899930280, 2.122346981
VGLM    linear loop  10 :  coefficients = 
1.9507572446, 0.0403094522, 1.9448416863, 0.0088110564, 1.8999302303, 
2.1223469667
VGLM    linear loop  11 :  coefficients = 
1.9507572462, 0.0403094558, 1.9448416889, 0.0088110539, 1.8999302382, 
2.1223469648
VGLM    linear loop  12 :  coefficients = 
1.9507572459, 0.0403094552, 1.9448416892, 0.0088110535, 1.8999302370, 
2.1223469646
     y3    y4
1 FALSE  TRUE
2 FALSE  TRUE
3  TRUE  TRUE
4  TRUE  TRUE
5  TRUE  TRUE
6  TRUE FALSE
     y3    y4
1 FALSE FALSE
2 FALSE FALSE
3 FALSE FALSE
4 FALSE FALSE
5 FALSE FALSE
6 FALSE FALSE
          [,1]      [,2]
[1,] 0.6867731 0.6867731
[2,] 1.0918217 1.0918217
[3,] 0.5821857 0.5821857
[4,] 1.7976404 1.7976404
[5,] 1.1647539 1.1647539
[6,] 0.5897658 0.5897658
         [,1]     [,2]
[1,] 3.689817 3.689817
[2,] 4.021058 4.021058
[3,] 3.544539 3.544539
[4,] 4.079014 4.079014
[5,] 3.672708 3.672708
[6,] 4.883644 4.883644
                 mu1  loge(sd1)      mu2   loge(sd2)
(Intercept) 1.950757 0.04030946 1.944842 0.008811054
x2          1.899930 0.00000000 2.122347 0.000000000

VGAM documentation built on Jan. 16, 2021, 5:21 p.m.