Estimating the Tobit-1 model with the charitable data set
In micsr: Microeconometrics with R

We'll reproduce here some results obtained by @WILH:08 using a data set which deals with charitable giving. The charitable data set is shiped with the micsr.

knitr::opts_chunk$set(message = FALSE, warning = FALSE, out.width = "100%",
                      fig.align = "center", cache = FALSE, echo = TRUE)
options(knitr.table.format = "latex", knitr.booktabs = TRUE)

library("micsr")

print(charitable, n = 5)

The response is called donation, it measures annual charitable givings in $US. This variable is left-censored for the value of 25, as this value corresponds to the item "less than 25 $US donation". Therefore, for this value, we have households who didn't make any charitable giving and some which made a small giving (from 1 to 24 $US).

The covariates used are the donation made by the parents (donparents), two factors indicating the educational level and religious beliefs (respectively education and religion), annual income (income) and two dummies for living in the south (south) and for married couples (married).

@WILH:08 consider the value of the donation in logs and substract $\ln 25$, so that the response is 0 for households who gave no donation or a small donation.

charitable$logdon <- with(charitable, log(donation) - log(25))

The tobit model can be estimated by maximum likelihood using AER::tobit, censReg::censReg or with the tobit1 package.

char_form <- logdon ~ log(donparents) + log(income) +
    education + religion + married + south
if (requireNamespace("AER")){
    library("AER")
    ml_aer <- tobit(char_form, data = charitable)
}
if (requireNamespace("censReg")){
    library("censReg")
    ml_creg <- censReg(char_form, data = charitable)
}
ml <- tobit1(char_form, data = charitable)

tobit1 provide a rich set of estimation methods, especially the SCLS (symetrically censored least squares) estimator proposed by @POWE:86. We also, for pedagogical purposes, estimate the OLS estimator although it is known to be unconsistent.

scls <- update(ml, method = "trimmed")
ols <- update(ml, method = "lm")

The results of the three models are presented in table below

if (requireNamespace("modelsummary")){
    modelsummary::msummary(list("OLS" = ols, "maximum likehihood" = ml, "SCLS" = scls),
                           title = "Estimation of charitable giving models",
                           label = "tab:models", single.row = TRUE, digits = 3)
}

The last two columns of the match exactly the first two columns of [@WILH:08, table 3 page 577]. Note that the OLS estimators are all lower in absolute values than those of the two other estimators, which illustrate the fact that OLS estimators are biased toward zero when the response is censored. The maximum likelihood is consistent and asymtotically efficient if the conditional distribution of $y^*$ (the latent variable) is homoscedastic and normal.

Specification tests for the maximum likelihood estimator can be conducted using conditional moments tests. This can easily be done using the micsr::cmtest function, which can take as input a model fitted by either AER::tobit, censReg::censReg or tobit1::tobit1:

cmtest(ml)

cmtest(ml_aer)
cmtest(ml_creg)

cmtest has a test argument with default value equal to normality. To get a heteroscedasticity test, we would use:

cmtest(ml, test = "heterosc")

Normality and heteroscedasticity are strongly rejected. The values are different from @WILH:08 as he used the "outer product of the gradient" form of the test. These versions of the test can be obtained by setting the OPG argument to TRUE.

cmtest(ml, test = "normality", opg = TRUE)
cmtest(ml, test = "heterosc", opg = TRUE)

Non-normality can be further investigate by testing separately the fact that the skewness and kurtosis indicators are respectively different from 0 and 3.

cmtest(ml, test = "skewness")
cmtest(ml, test = "kurtosis")

The hypothesis that the conditional distribution of the response is mesokurtic is not rejected at the 1% level and the main problem seems to be the asymetry of the distribution, even after taking the logarithm of the response.

This can be illustrated by plotting the (unconditional) distribution of the response (for positive values) and adding to the histogram the normal density curve.

library("ggplot2")
library("dplyr")
moments <- charitable %>% filter(logdon > 0) %>%
    summarise(mu = mean(logdon), sigma = sd(logdon))
ggplot(filter(charitable, logdon > 0), aes(logdon)) +
    geom_histogram(aes(y = ..density..), color = "black",
                   fill = "white", bins = 10) +
    geom_function(fun = dnorm, args = list(mean = moments$mu,
                                           sd = moments$sigma)) +
    labs(x = "log of charitable giving", y = NULL)