smoothSurvey  R Documentation 
This function calculates the direct estimates by region and fit a simple spatial smoothing model to the direct estimates adjusting for survey design. Normal or binary variables are currently supported. For binary variables, the logit transformation is performed on the direct estimates of probabilities, and a Gaussian additive model is fitted on the logit scale using INLA.
smoothSurvey(
data,
geo = NULL,
Amat = NULL,
region.list = NULL,
X = NULL,
X.unit = NULL,
responseType = c("binary", "gaussian")[1],
responseVar,
strataVar = "strata",
weightVar = "weights",
regionVar = "region",
clusterVar = "~v001+v002",
pc.u = 1,
pc.alpha = 0.01,
pc.u.phi = 0.5,
pc.alpha.phi = 2/3,
CI = 0.95,
formula = NULL,
timeVar = NULL,
time.model = c("rw1", "rw2")[1],
include_time_unstruct = FALSE,
type.st = 1,
direct.est = NULL,
direct.est.var = NULL,
is.unit.level = FALSE,
is.agg = FALSE,
strataVar.within = NULL,
totalVar = NULL,
weight.strata = NULL,
nsim = 1000,
save.draws = FALSE,
smooth = TRUE,
...
)
fitGeneric(
data,
geo = NULL,
Amat = NULL,
region.list = NULL,
X = NULL,
X.unit = NULL,
responseType = c("binary", "gaussian")[1],
responseVar,
strataVar = "strata",
weightVar = "weights",
regionVar = "region",
clusterVar = "~v001+v002",
pc.u = 1,
pc.alpha = 0.01,
pc.u.phi = 0.5,
pc.alpha.phi = 2/3,
CI = 0.95,
formula = NULL,
timeVar = NULL,
time.model = c("rw1", "rw2")[1],
include_time_unstruct = FALSE,
type.st = 1,
direct.est = NULL,
direct.est.var = NULL,
is.unit.level = FALSE,
is.agg = FALSE,
strataVar.within = NULL,
totalVar = NULL,
weight.strata = NULL,
nsim = 1000,
save.draws = FALSE,
smooth = TRUE,
...
)
data 
The input data frame. The input data with column of the response variable (

geo 
Deprecated argument from early versions. 
Amat 
Adjacency matrix for the regions. If set to NULL, the IID spatial effect will be used. 
region.list 
a vector of region names. Only used when IID model is used and the adjacency matrix not specified. This allows the output to include regions with no sample in the data. When the spatial adjacency matrix is specified, the column names of the adjacency matrix will be used to determine region.list. If set to NULL, all regions in the data are used. 
X 
Areal covariates data frame. One of the column name needs to match the 
X.unit 
Column names of unitlevel covariates. When 
responseType 
Type of the response variable, currently supports 'binary' (default with logit link function) or 'gaussian'. 
responseVar 
the response variable 
strataVar 
the strata variable used in the arealevel model. 
weightVar 
the weights variable 
regionVar 
Variable name for region. 
clusterVar 
Variable name for cluster. For arealevel model, this should be a formula for cluster in survey design object, e.g., '~clusterID + householdID'. For unitlevel model, this should be the variable name for cluster unit. 
pc.u 
hyperparameter U for the PC prior on precisions. 
pc.alpha 
hyperparameter alpha for the PC prior on precisions. 
pc.u.phi 
hyperparameter U for the PC prior on the mixture probability phi in BYM2 model. 
pc.alpha.phi 
hyperparameter alpha for the PC prior on the mixture probability phi in BYM2 model. 
CI 
the desired posterior credible interval to calculate 
formula 
a string of userspecified random effects model to be used in the INLA call 
timeVar 
The variable indicating time period. If set to NULL then the temporal model and spacetime interaction model are ignored. 
time.model 
the model for temporal trends and interactions. It can be either "rw1" or "rw2". 
include_time_unstruct 
Indicator whether to include the temporal unstructured effects (i.e., shocks) in the smoothed estimates from clusterlevel model. The argument only applies to the unitlevel models. Default is FALSE which excludes all unstructured temporal components. If set to TRUE all the unstructured temporal random effects will be included. 
type.st 
can take values 0 (no interaction), or 1 to 4, corresponding to the type I to IV spacetime interaction. 
direct.est 
data frame of direct estimates, with column names of response and region specified by 
direct.est.var 
the column name corresponding to the variance of direct estimates. 
is.unit.level 
logical indicator of whether unitlevel model is fitted instead of arealevel model. 
is.agg 
logical indicator of whether the input is at the aggregated counts by cluster. Only used for unitlevel model and binary response variable. 
strataVar.within 
the variable specifying within region stratification variable. This is only used for the unitlevel model. 
totalVar 
the variable specifying total observations in 
weight.strata 
a data frame with one column corresponding to 
nsim 
number of posterior draws to take. This is only used for the unitlevel model when 
save.draws 
logical indicator of whether to save the full posterior draws. 
smooth 
logical indicator of whether to perform smoothing. If set to FALSE, a data frame of direct estimate is returned. Only used when 
... 
additional arguments passed to 
The function smoothSurvey
replaces the previous function name fitGeneric
(before version 1.0.0).
HT 
Direct estimates 
smooth 
Smoothed direct estimates 
fit 
a fitted INLA object 
CI 
input argument 
Amat 
input argument 
responseType 
input argument 
formula 
INLA formula 
Zehang Richard Li
getDirectList
, smoothDirect
## Not run:
##
## 1. Arealevel model with binary response
##
data(DemoData2)
data(DemoMap2)
fit0 < smoothSurvey(data=DemoData2,
Amat=DemoMap2$Amat, responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
summary(fit0)
# if only direct estimates without smoothing is of interest
fit0.dir < smoothSurvey(data=DemoData2,
Amat=DemoMap2$Amat, responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95, smooth = FALSE)
# posterior draws can be returned with save.draws = TRUE
fit0.draws < smoothSurvey(data=DemoData2,
Amat=DemoMap2$Amat, responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95, save.draws = TRUE)
# notice the posterior draws are on the latent scale
head(fit0.draws$draws.est[, 1:10])
# Example with regionlevel covariates
Xmat < aggregate(age~region, data = DemoData2,
FUN = function(x) mean(x))
fit1 < smoothSurvey(data=DemoData2,
Amat=DemoMap2$Amat, responseType="binary",
X = Xmat,
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
# Example with using only direct estimates as input instead of the full data
direct < fit0$HT[, c("region", "HT.est", "HT.var")]
fit2 < smoothSurvey(data=NULL, direct.est = direct,
Amat=DemoMap2$Amat, regionVar="region",
responseVar="HT.est", direct.est.var = "HT.var",
responseType = "gaussian")
# Check it is the same as fit0
plot(fit2$smooth$mean, fit0$smooth$mean)
# Example with using only direct estimates as input,
# and after transformation into a Gaussian smoothing model
# Notice: the output are on the same scale as the input
# and in this case, the logit estimates.
direct.logit < fit0$HT[, c("region", "HT.logit.est", "HT.logit.var")]
fit3 < smoothSurvey(data=NULL, direct.est = direct.logit,
Amat=DemoMap2$Amat, regionVar="region",
responseVar="HT.logit.est", direct.est.var = "HT.logit.var",
responseType = "gaussian")
# Check it is the same as fit0
plot(fit3$smooth$mean, fit0$smooth$logit.mean)
# Example with nonspatial smoothing using IID random effects
fit4 < smoothSurvey(data=DemoData2, responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
# Example with missing regions in the raw input
DemoData2.sub < subset(DemoData2, region != "central")
fit.without.central < smoothSurvey(data=DemoData2.sub,
Amat=NULL, responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
fit.without.central$HT
fit.without.central$smooth
fit.without.central < smoothSurvey(data=DemoData2.sub,
Amat=NULL, region.list = unique(DemoData2$region),
responseType="binary",
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
fit.with.central$HT
fit.with.central$smooth
# Using the formula argument, further customizations can be added to the
# model fitted. For example, we can fit the FayHarriot model with
# IID effect instead of the BYM2 random effect as follows.
# The "region.struct" and "hyperpc1" are picked to match internal object
# names. Other object names can be inspected from the source of smoothSurvey.
fit5 < smoothSurvey(data=DemoData2,
Amat=DemoMap2$Amat, responseType="binary",
formula = "f(region.struct, model = 'iid', hyper = hyperpc1)",
pc.u = 1, pc.alpha = 0.01,
responseVar="tobacco.use", strataVar="strata",
weightVar="weights", regionVar="region",
clusterVar = "~clustid+id", CI = 0.95)
# Check it is the same as fit4, notice the region order may be different
regions < fit5$smooth$region
plot(fit4$smooth[match(regions, fit4$smooth$region),]$logit.mean, fit5$smooth$logit.mean)
##
## 2. Unitlevel model with binary response
##
# For unitlevel models, we need to create stratification variable within regions
data < DemoData2
data$urbanicity < "rural"
data$urbanicity[grep("urban", data$strata)] < "urban"
# Betabinomial likelihood is used in this model
fit6 < smoothSurvey(data=data,
Amat=DemoMap2$Amat, responseType="binary",
X = Xmat, is.unit.level = TRUE,
responseVar="tobacco.use", strataVar.within = "urbanicity",
regionVar="region", clusterVar = "clustid", CI = 0.95)
# We may use aggregated PSUlevel counts as input as well
# in the case of modeling a binary outcome
data.agg < aggregate(tobacco.use~region + urbanicity + clustid,
data = data, FUN = sum)
data.agg.total < aggregate(tobacco.use~region + urbanicity + clustid,
data = data, FUN = length)
colnames(data.agg.total)[4] < "total"
data.agg < merge(data.agg, data.agg.total)
head(data.agg)
fit7 < smoothSurvey(data=data.agg,
Amat=DemoMap2$Amat, responseType="binary",
X = Xmat, is.unit.level = TRUE, is.agg = TRUE,
responseVar = "tobacco.use", strataVar.within = "urbanicity",
totalVar = "total", regionVar="region", clusterVar = "clustid", CI = 0.95)
# Check it is the same as fit6
plot(fit6$smooth$mean, fit7$smooth$mean)
##
## 3. Arealevel model with continuous response
##
# The smoothing model is the same as arealevel model with binary response
# the continuous direct estimates are smoothed instead of
# their logittransformed versions for binary response.
fit8 < smoothSurvey(data=DemoData2, Amat=DemoMap2$Amat,
responseType="gaussian", responseVar="age", strataVar="strata",
weightVar="weights", regionVar="region",
pc.u.phi = 0.5, pc.alpha.phi = 0.5,
clusterVar = "~clustid+id", CI = 0.95)
##
## 4. Unitlevel model with continuous response
## (or nested error models)
# The unitlevel model assumes for each of the ith unit,
# Y_{i} ~ intercept + region_effect + IID_i
# where IID_i is the error term specific to ith unit
# When more than one level of cluster sampling is carried out,
# they are ignored here. Only the input unit is considered.
# So here we do not need to specify clusterVar any more.
fit9 < smoothSurvey(data= data,
Amat=DemoMap2$Amat, responseType="gaussian",
is.unit.level = TRUE, responseVar="age", strataVar.within = NULL,
regionVar="region", clusterVar = NULL, CI = 0.95)
# To compare, we may also model PSUlevel responses. As an illustration,
data.median < aggregate(age~region + urbanicity + clustid,
data = data, FUN = median)
fit10 < smoothSurvey(data= data.median,
Amat=DemoMap2$Amat, responseType="gaussian",
is.unit.level = TRUE, responseVar="age", strataVar.within = NULL,
regionVar="region", clusterVar = "clustid", CI = 0.95)
# To further incorporate withinarea stratification
fit11 < smoothSurvey(data = data,
Amat = DemoMap2$Amat, responseType = "gaussian",
is.unit.level = TRUE, responseVar="age", strataVar.within = "urbanicity",
regionVar = "region", clusterVar = NULL, CI = 0.95)
# Notice the usual output is now stratified within each region
# The aggregated estimates require strata proportions for each region
# For illustration, we set strata population proportions below
prop < data.frame(region = unique(data$region),
urban = 0.3,
rural = 0.7)
fit12 < smoothSurvey(data=data,
Amat=DemoMap2$Amat, responseType="gaussian",
is.unit.level = TRUE, responseVar="age", strataVar.within = "urbanicity",
regionVar="region", clusterVar = NULL, CI = 0.95,
weight.strata = prop)
# aggregated outcome
head(fit12$smooth.overall)
# Compare aggregated outcome with direct aggregating posterior means.
# There could be small differences if only 1000 posterior draws are taken.
est.urb < subset(fit11$smooth, strata == "urban")
est.rural < subset(fit11$smooth, strata == "rural")
est.mean.post < est.urb$mean * 0.3 + est.rural$mean * 0.7
plot(fit12$smooth.overall$mean, est.mean.post)
##
## 6. Unitlevel model with continuous response and unitlevel covariate
##
# For arealevel prediction, arealevel covariate mean needs to be
# specified in X argument. And unitlevel covariate names are specified
# in X.unit argument.
set.seed(1)
sim < data.frame(region = rep(c(1, 2, 3, 4), 1000),
X1 = rnorm(4000), X2 = rnorm(4000))
Xmean < aggregate(.~region, data = sim, FUN = sum)
sim$Y < rnorm(4000, mean = sim$X1 + 0.3 * sim$X2 + sim$region)
samp < sim[sample(1:4000, 20), ]
fit.sim < smoothSurvey(data=samp ,
X.unit = c("X1", "X2"),
X = Xmean, Amat=NULL, responseType="gaussian",
is.unit.level = TRUE, responseVar="Y", regionVar = "region",
pc.u = 1, pc.alpha = 0.01, CI = 0.95)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.