Description Usage Arguments Details Value Note Author(s) See Also Examples
This function estimates interval regression using either common intervals for all observations, or observation-specific intervals. Normal, logistic, log-log, and Cauchy disturbances are supported.
1 2 3 4 5 |
formula |
an object of class '"formula"' (or one that can be coerced to
that class): a symbolic description of the model to be
fitted. The left-hand-side variable must be either a factor,
describing the intervals where the observations fall to, or a matrix of
two columns, describing the interval boundaries for each
observation. See details below, see
also |
data |
an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which ‘intReg’ is called. |
weights |
an optional vector of weights to be used in the fitting process. Should be 'NULL' or a numeric vector. If non-NULL, weighted likelihood is maximized (that is, maximizing ‘sum(w*loglik)’). |
start |
Initial values for the optimization algorithm. if
‘NULL’, initial values are calculated based on interval
means. Note that |
boundaries |
Boundaries for intervals. See details. |
... |
further arguments to |
subset |
an optional logical vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain ‘NA’s. The default is set by the ‘na.action’ setting of sQuoteoptions, and is ‘na.fail’ if that is unset. The “factory-fresh” default is ‘na.omit’. Another possible value is ‘NULL’, no action. Value ‘na.exclude’ can be useful. |
contrasts |
an optional list. See the ‘contrasts.arg’ of
|
Hess |
Should Hessian of the model be returned |
model |
logical for whether the model matrix should be returned. |
method |
character, distribution of disturbances or ‘model.frame’. ‘probit’, ‘logistic’, ‘cloglog’ and ‘cauchit’ assume these disturbance distributions. ‘model.frame’ returns the model frame. The default value ‘probit’ assumes normal distribution. |
print.level |
output of run-time information: higher level prints more. |
iterlim |
maximum number of optimization iterations |
Interval regression is a form of linear regression where only intervals (i.e numeric upper and lower bounds) where the observations fall are visible of the otherwise continuous outcome variable. The current implementation assumes known distribution of the error term (default is normal) and estimates the model using Maximum Likelihood.
The intervals may be specified in two ways: either common intervals for all the observations by using argument 'boundaries', or by specifying the response as Nx2 matrix, where columns correspond to the lower- and upper bound for the individual observations. If you specify the common boundaries, you may want to give names to the components of 'boundaries'. Otherwise they will be names ‘Boundary 1’, ‘Boundary 2’ etc. These names will appear to the estimation results. For observations-specific boundaries, the names are generated automatically.
Object of class ‘intReg’ which inherits from class ‘maxLik’.
There are several methods, including summary
and
coef
, partly inherited from "maxLik" class.
Note that the boundaries are passed as fixed parameters to the
maxLik
estimation routine and hence returned as fixed estimates
with standard errors set to zero.
A lot of code is borrowed from polr
from the package MASS
Ott Toomet otoomet@gmail.com
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ## Example of observation-specific boundaries
## Estimate the willingness to pay for the Kakadu National Park
## Data given in intervals -- 'lower' for lower bound and 'upper' for upper bound.
## Note that dichotomous-coice answers are already coded to 'lower' and 'upper'
data(Kakadu, package="Ecdat")
set.seed(1)
Kakadu <- Kakadu[sample(nrow(Kakadu), 500),]
# subsample to speed up the estimation
## Estimate in log form, change 999 to Inf
lb <- log(Kakadu$lower)
ub <- Kakadu$upper
ub[ub > 998] <- Inf
ub <- log(ub)
y <- cbind(lb, ub)
m <- intReg(y ~ sex + log(income) + age + schooling +
recparks + jobs + lowrisk + wildlife + future + aboriginal + finben +
mineparks + moreparks + gov +
envcon + vparks + tvenv + major, data=Kakadu)
## You may want to compare the results to Werner (1999),
## Journal of Business and Economics Statistics 17(4), pp 479-486
print(summary(m))
##
## Example of setting initial values
##
st <- coef(m, boundaries=TRUE)
st[1:19] <- 1 # set all coefficients to 1
st["sigma"] <- 1 # set standard deviation to 1
m <- intReg(y ~ sex + log(income) + age + schooling +
recparks + jobs + lowrisk + wildlife + future + aboriginal + finben +
mineparks + moreparks + gov +
envcon + vparks + tvenv + major,
start=st,
data=Kakadu)
## Note: the results will be the same as above
##
## Example of common intervals for all the observations
##
library(Ecdat)
data(Bwages)
## calculate an ordinary Mincer-style wage regression. First by OLS and
## thereafter cut the wage to intervals and estimate with 'intReg'
## Note: gross hourly wage rate in EUR
ols <- lm(log(wage) ~ factor(educ) + poly(exper, 2), data=Bwages)
cat("OLS estimate:\n")
print(summary(ols))
## Now we censor the wages to intervals
intervals <- c(0, 5, 10, 15, 25, Inf)
salary <- cut(Bwages$wage, intervals)
int <- intReg(salary ~ factor(educ) + poly(exper, 2), data=Bwages, boundaries=log(intervals))
## Note: use logs for the intervals in Euros. We do not have to
## transform salaris to log form as this does not change the intervals.
## Ignore any warnings
cat("Interval regression:\n")
print(summary(int))
|
Loading required package: miscTools
Loading required package: maxLik
Please cite the 'maxLik' package as:
Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
https://r-forge.r-project.org/projects/maxlik/
--------------------------------------------
Interval regression
Maximum Likelihood estimation
BHHH maximisation, 10 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -388.4076
500 observations, 20 free parameters (df = 480)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.15184 2.45620 2.505 0.012589 *
sexmale -0.29115 0.42520 -0.685 0.493846
log(income) 0.27120 0.26269 1.032 0.302420
age -0.04027 0.01210 -3.327 0.000944 ***
schooling -0.08597 0.11507 -0.747 0.455360
recparks 0.02805 0.17717 0.158 0.874289
jobs -0.58741 0.17699 -3.319 0.000972 ***
lowrisk -0.82210 0.17765 -4.628 4.77e-06 ***
wildlife 0.25476 0.28975 0.879 0.379713
future 0.05356 0.23680 0.226 0.821153
aboriginal 0.12433 0.17646 0.705 0.481396
finben -0.43784 0.17458 -2.508 0.012471 *
mineparks 0.55647 0.16470 3.379 0.000788 ***
moreparks 0.46348 0.17511 2.647 0.008392 **
gov -1.28939 0.80773 -1.596 0.111076
envconyes 0.16045 0.40575 0.395 0.692690
vparksyes 0.21119 0.42780 0.494 0.621760
tvenv -0.13962 0.15103 -0.924 0.355730
majoryes 0.63731 0.38715 1.646 0.100386
sigma 2.93924 0.29179 10.073 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------
Loading required package: Ecfun
Attaching package: 'Ecfun'
The following object is masked from 'package:miscTools':
checkNames
The following object is masked from 'package:base':
sign
Attaching package: 'Ecdat'
The following object is masked _by_ '.GlobalEnv':
Kakadu
The following object is masked from 'package:datasets':
Orange
OLS estimate:
Call:
lm(formula = log(wage) ~ factor(educ) + poly(exper, 2), data = Bwages)
Residuals:
Min 1Q Median 3Q Max
-1.60391 -0.15586 -0.00126 0.17378 1.07186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.97292 0.02964 66.570 < 2e-16 ***
factor(educ)2 0.14117 0.03408 4.142 3.63e-05 ***
factor(educ)3 0.29670 0.03283 9.039 < 2e-16 ***
factor(educ)4 0.45290 0.03372 13.433 < 2e-16 ***
factor(educ)5 0.62903 0.03401 18.496 < 2e-16 ***
poly(exper, 2)1 6.37744 0.30192 21.123 < 2e-16 ***
poly(exper, 2)2 -2.20628 0.28756 -7.672 3.06e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2868 on 1465 degrees of freedom
Multiple R-squared: 0.3767, Adjusted R-squared: 0.3741
F-statistic: 147.6 on 6 and 1465 DF, p-value: < 2.2e-16
Interval regression:
--------------------------------------------
Interval regression
Maximum Likelihood estimation
BHHH maximisation, 8 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -1353.956
1472 observations, 8 free parameters (df = 1464)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.893994 0.038646 49.009 < 2e-16 ***
factor(educ)2 0.213102 0.043328 4.918 9.71e-07 ***
factor(educ)3 0.368797 0.041728 8.838 < 2e-16 ***
factor(educ)4 0.530030 0.043067 12.307 < 2e-16 ***
factor(educ)5 0.717596 0.041679 17.217 < 2e-16 ***
poly(exper, 2)1 6.498874 0.355411 18.286 < 2e-16 ***
poly(exper, 2)2 -2.168018 0.289443 -7.490 1.18e-13 ***
sigma 0.284659 0.006062 46.961 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.