tobit1: Tobit I family for censored GAM

Description Usage Arguments Details Value References See Also Examples

View source: R/tobit1.R

Description

This function implements the Tobit I family for the mgcv package.

Usage

1
2
tobit1(link="identity", left.threshold=-Inf, right.threshold=Inf, 
theta=NULL, initial.theta=0)

Arguments

link

The link function: one of "log", "identity", "inverse", "sqrt", or a power link.

left.threshold

Threshold value, for which response values below this will be censored. Can be a scalar or a vector of same length as the response.

right.threshold

Threshold value, for which response values above this will be censored. Can be a scalar or a vector of same length as the response.

theta

The log std error. If left NULL, it is estimated.

initial.theta

Optional parameter to set initial theta value if it is being estimated - change this value if the variance is very different from 1.

Details

Under the Tobit I model, given a value sigma and a conditional mean of mu, and left and right threshold values lt and rt, response values y are distributed as

lt if z < lt

rt if z > rt

z otherwise

where z is distribution Normal(mu, sigma). (Note that we have extended the model to include right as well as left censoring.

This function allows a non-linear relationship be estimated between mu and the covariates in a restricted maximum likelihood approach, via application of Wood (2016). We allow differing levels of censorship across dataset by allowing the left and right thresholds to be different between data points.

See the examples for more details of how to fit in practice.

Value

An object inheriting from class family for use with the mgcv package.

References

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association. <URL: http://arxiv.org/abs/1511.03864>

See Also

gam,ziplss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Generate random data
set.seed(1)
x <- matrix(2*rnorm(300), 100)
yn <- 2*x[,3] + 4*cos(x[,1]*2)
y <- yn + rnorm(100)
ycensored <- pmax(y, 0) # data left-censored at 0
ycensored <- pmin(ycensored, 4) # data right-censored at 4

par(mfrow = c(3,3))

# True model
plot(gam(y ~ s(x[,1]) + s(x[,2]) + s(x[, 3])), ylim=c(-5, 5), main = "True")

# Naive estimation
plot(gam(ycensored ~ s(x[,1]) + s(x[,2]) + s(x[, 3])), ylim=c(-5, 5), main = "Naive")

# Tobit I estimation
m <- gam(ycensored ~ s(x[,1]) + s(x[,2]) + s(x[, 3]), family = tobit1(left.threshold=0))
summary(m) #note x[,2] is not significant
m$family$getTheta(FALSE) #estimate of theta
m$family$getTheta(TRUE) #estimate of sigma = exp(theta)
plot(m, ylim = c(-5, 5), main = "Tobit I") 

# More realistic dataset requires some data processing
data(nutrient)
# For this dataset, all values >1 are censored. At location D values are censored below at 0, other
# locations are censored at 0.1.
head(nutrient)
nut = data.frame(day = nutrient$day, location = factor(nutrient$location), y=as.numeric(nutrient$y))
table(nutrient$y[is.na(nut$y)])
summary(nut$y)
nut$upper = 10
nut$lower = ifelse(nut$location == "D", 0, 1)
# Recode the data to communicate which is and isn't censored
nut$y[nutrient$y %in% c("<1", "<0")] = -Inf
nut$y[nutrient$y %in% c(">10")] = Inf
# Missing values are best removed here, or can cause confusion later
nut = na.omit(nut)

# Fit including a random effect for location
m = gam(y~ s(day) + s(location, bs="re") , data = nut, 
family = tobit1(left.threshold=nut$lower, right.threshold = nut$upper))

gam.vcomp(m)
anova(m)
summary(m)

cenGAM documentation built on May 1, 2019, 6:29 p.m.