intReg: Interval Regression

Description Usage Arguments Details Value Author(s) See Also Examples

Description

This function estimates interval regression using either common intervals for all observations, or observation-specific intervals potentially including point observations. Normal, logistic, log-log, and Cauchy disturbances are supported.

Usage

1
2
3
4
5
6
7
intReg(formula, start, boundaries, ...,
       contrasts = NULL, Hess = FALSE, model = TRUE,
       method = c("probit", "logistic", "cloglog", "cauchit", "model.frame"),
       minIntervalWidth=10*sqrt(.Machine$double.eps),
       print.level = 0,
       data, subset, weights, na.action,
       iterlim=100)

Arguments

formula

an object of class '"formula"' (or one that can be coerced to that class): a symbolic description of the model to be fitted. The left-hand-side variable must be either a factor, describing the intervals where the observations fall to, or a numeric matrix of two columns, describing the interval boundaries for each observation. See details below. See also lm for explanation how to write formulas.

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which ‘intReg’ is called.

weights

an optional vector of weights to be used in the fitting process. Should be 'NULL' or a numeric vector. If non-NULL, weighted likelihood is maximized (that is, maximizing ‘sum(w*loglik)’).

start

Initial values for the optimization algorithm. if ‘NULL’, these are calculated using interval means. Note that intReg expects the full-length inital values, including the parameter boundaries and the standard deviation of the error term. See the example below.

boundaries

boundaries for intervals. See details.

...

further arguments to maxLik

subset

an optional logical vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain ‘NA’s. The default is set by the ‘na.action’ setting of sQuoteoptions, and is ‘na.fail’ if that is unset. The “factory-fresh” default is ‘na.omit’. Another possible value is ‘NULL’, no action. Value ‘na.exclude’ can be useful.

contrasts

an optional list. See the ‘contrasts.arg’ of model.matrix.default.

Hess

Should Hessian of the model be returned

model

logical for whether the model matrix should be returned.

method

character, distribution of disturbances or ‘model.frame’. ‘probit’, ‘logistic’, ‘cloglog’ and ‘cauchit’ assume these disturbance distributions. ‘model.frame’ returns the model frame. The default value ‘probit’ assumes normal distribution.

minIntervalWidth

minimal width of the interval. If an interval is shorter than this, it is assumed to be a point-observation

print.level

output of run-time information: higher level prints more.

iterlim

maximum number of optimization iterations

Details

Interval regression is a form of linear regression where only intervals (i.e numeric upper and lower bounds) where the observations fall are visible of the otherwise continuous outcome variable. The current implementation assumes known distribution of the error term (default is normal) and estimates the model using Maximum Likelihood.

The intervals may be specified in two ways: either common intervals for all the observations by using argument 'boundaries', or by specifying the response as N x 2 matrix, where columns correspond to the lower- and upper bound for the individual observations. One may want to give informative names to the individual boundaries, otherwise these will be names ‘Boundary 1’, ‘Boundary 2’ etc. These names will appear to the estimation results. For observations-specific boundaries, the names are generated automatically.

Note that the limit of an interval observation is point observation when interval length approaches to zero. In order to avoid numerical problems, intervals narrower than minIntervalWidth are assumed to be point observations and handled separately.

Value

Object of class ‘intReg’ which inherits from class ‘maxLik’.

There are several methods, including summary and coef, partly inherited from "maxLik" class.

Note that the boundaries are passed as fixed parameters to the maxLik estimation routine and hence returned as fixed estimates with standard errors set to zero.

Author(s)

Ott Toomet otoomet@gmail.com, a lot of code borrowed from polr (W. N. Venables and B. D. Ripley.)

See Also

polr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
## Example of observation-specific boundaries
## Estimate the willingness to pay for the Kakadu National Park
## Data given in intervals -- 'lower' for lower bound and 'upper' for upper bound.
## Note that dichotomous-coice answers are already coded to 'lower' and 'upper'
data(Kakadu, package="Ecdat")
set.seed(1)
Kakadu <- Kakadu[sample(nrow(Kakadu), 500),]
                        # subsample to speed up the estimation
## Estimate in log form, change 999 to Inf
lb <- log(Kakadu$lower)
ub <- Kakadu$upper
ub[ub > 998] <- Inf
ub <- log(ub)
y <- cbind(lb, ub)
m <- intReg(y ~ sex + log(income) + age + schooling + 
              recparks + jobs + lowrisk + wildlife + future + aboriginal + finben +
              mineparks + moreparks + gov +
              envcon + vparks + tvenv + major, data=Kakadu)
## You may want to compare the results to Werner (1999),
## Journal of Business and Economics Statistics 17(4), pp 479-486
print(summary(m))
##
## Example of setting initial values
##
st <- coef(m, boundaries=TRUE)
st[1:19] <- 1     # set all coefficients to 1
st["sigma"] <- 1  # set standard deviation to 1
m <- intReg(y ~ sex + log(income) + age + schooling + 
              recparks + jobs + lowrisk + wildlife + future + aboriginal + finben +
              mineparks + moreparks + gov +
              envcon + vparks + tvenv + major,
              start=st,
              data=Kakadu)
## Note: the results will be the same as above
##
## Example of common intervals for all the observations
##
data(Bwages, package="Ecdat")
## calculate an ordinary Mincer-style wage regression.  First by OLS and
## thereafter cut the wage to intervals and estimate with 'intReg'
## Note: gross hourly wage rate in EUR
ols <- lm(log(wage) ~ factor(educ) + poly(exper, 2), data=Bwages)
cat("OLS estimate:\n")
print(summary(ols))
## Now we censor the wages to intervals
intervals <- c(0, 5, 10, 15, 25, Inf)
salary <- cut(Bwages$wage, intervals)
int <- intReg(salary ~ factor(educ) + poly(exper, 2), data=Bwages, boundaries=log(intervals))
## Note: use logs for the intervals in Euros.  We do not have to
## transform salaris to log form as this does not change the intervals.
## Ignore any warnings
cat("Interval regression:\n")
print(summary(int))

Example output

Loading required package: miscTools
Loading required package: maxLik

Please cite the 'maxLik' package as:
Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.

If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
https://r-forge.r-project.org/projects/maxlik/
--------------------------------------------
Interval regression
Maximum Likelihood estimation
BHHH maximisation, 10 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -388.4076 
500 observations, 20 free parameters (df = 480)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.15184    2.45620   2.505 0.012589 *  
sexmale     -0.29115    0.42520  -0.685 0.493846    
log(income)  0.27120    0.26269   1.032 0.302420    
age         -0.04027    0.01210  -3.327 0.000944 ***
schooling   -0.08597    0.11507  -0.747 0.455360    
recparks     0.02805    0.17717   0.158 0.874289    
jobs        -0.58741    0.17699  -3.319 0.000972 ***
lowrisk     -0.82210    0.17765  -4.628 4.77e-06 ***
wildlife     0.25476    0.28975   0.879 0.379713    
future       0.05356    0.23680   0.226 0.821153    
aboriginal   0.12433    0.17646   0.705 0.481396    
finben      -0.43784    0.17458  -2.508 0.012471 *  
mineparks    0.55647    0.16470   3.379 0.000788 ***
moreparks    0.46348    0.17511   2.647 0.008392 ** 
gov         -1.28939    0.80773  -1.596 0.111076    
envconyes    0.16045    0.40575   0.395 0.692690    
vparksyes    0.21119    0.42780   0.494 0.621760    
tvenv       -0.13962    0.15103  -0.924 0.355730    
majoryes     0.63731    0.38715   1.646 0.100386    
sigma        2.93924    0.29179  10.073  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------
OLS estimate:

Call:
lm(formula = log(wage) ~ factor(educ) + poly(exper, 2), data = Bwages)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.60391 -0.15586 -0.00126  0.17378  1.07186 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1.97292    0.02964  66.570  < 2e-16 ***
factor(educ)2    0.14117    0.03408   4.142 3.63e-05 ***
factor(educ)3    0.29670    0.03283   9.039  < 2e-16 ***
factor(educ)4    0.45290    0.03372  13.433  < 2e-16 ***
factor(educ)5    0.62903    0.03401  18.496  < 2e-16 ***
poly(exper, 2)1  6.37744    0.30192  21.123  < 2e-16 ***
poly(exper, 2)2 -2.20628    0.28756  -7.672 3.06e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2868 on 1465 degrees of freedom
Multiple R-squared:  0.3767,	Adjusted R-squared:  0.3741 
F-statistic: 147.6 on 6 and 1465 DF,  p-value: < 2.2e-16

Interval regression:
--------------------------------------------
Interval regression
Maximum Likelihood estimation
BHHH maximisation, 8 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -1353.956 
1472 observations, 8 free parameters (df = 1464)
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1.893994   0.038646  49.009  < 2e-16 ***
factor(educ)2    0.213102   0.043328   4.918 9.71e-07 ***
factor(educ)3    0.368797   0.041728   8.838  < 2e-16 ***
factor(educ)4    0.530030   0.043067  12.307  < 2e-16 ***
factor(educ)5    0.717596   0.041679  17.217  < 2e-16 ***
poly(exper, 2)1  6.498874   0.355411  18.286  < 2e-16 ***
poly(exper, 2)2 -2.168018   0.289443  -7.490 1.18e-13 ***
sigma            0.284659   0.006062  46.961  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------

intReg documentation built on May 2, 2019, 4:43 p.m.