Description Usage Arguments Details Value Author(s) References See Also Examples
Estimation of an underlying individuallevel logistic regression model, using aggregate data alone, individuallevel data alone or a combination of aggregate and individuallevel data. Any number number of covariates can be included in the individuallevel regression. Covariates can be binary or categorical, expressed as proportions over the group, or normallydistributed, expressed as withinarea means and optional covariances. A general formula for grouplevel (contextual) effects can also be supplied.
1 2 3 4 
formula 
A model formula containing the grouplevel binomial response on the lefthand side, and general grouplevel covariates on the righthand side. For example,
If 
binary 
An optional model formula with an empty lefthand
side. The righthand side should contain the names of any grouplevel
proportions, which are to modelled as individuallevel binary
predictors of the response given in

categorical 
An optional list of matrices or data frames.
Each element corresponds to a categorical covariate. Each element
has the same number of rows as the aggregate data, and number of columns
corresponding to the number of levels of the categorical covariate.
The cells give the number or proportion of individuals in the area in each
category. These will be modelled as individuallevel predictors of
the response given in 
normal 
An optional model formula with an empty lefthand
side. The righthand side should list variables containing the
grouplevel means of normallydistributed covariates. These will be
modelled as individuallevel predictors of the response given in

iformula 
A model for the corresponding individuallevel data. The
individuallevel binary response should be on the righthand side,
and the individuallevel covariates should be on the lefthand side.
They should represent the same covariates, in the same order, as given in
If 
data 
Data frame containing the grouplevel variables given in

idata 
Data frame containing the individuallevel variables
given in 
groups 
A grouplevel variable containing the group identifiers
to be matched with the groups given in 
igroups 
An individuallevel variable containing the group
identifiers of the individuallevel data to be matched with the
groups given in 
strata 
A matrix with the same number of rows as the aggregate
data. Rows representing groups, and columns
representing strata occupancy probabilities, often estimated as
observed occupancy proportions. The relative risks for the strata
will be included as fixed offsets in the underlying logistic
regression, using the probabilites supplied in 
istrata 
A variable containing the individuallevel variable
indicating the stratum an individual occupies. This should be a
factor with the levels corresponding to the columns of the matrix

pstrata 
A vector with one element for each stratum, giving the assumed baseline outcome probabilities for the strata. 
cross 
A matrix giving the joint withinarea distribution of
all the covariates supplied in column 1: covariate 1 absent, covariate 2 absent, ..., covariate n1 absent, covariate n absent (assuming n binary covariates, with the obvious generalisation
for categorical covariates) If 
norm.var 
A data frame, matrix or list, supplying the withinarea covariances of the continuous covariates. If If

random 
If 
pars 
Vector of initial values of the model parameters, given in the following order: logitscale intercept, If not supplied, the initial values are 0 for all covariate effects, 1 for the random effects standard deviation. The intercept is initialised to the logit mean outcome proportion over groups from the aggregate data. 
fixed 
If 
model 
If "marginal" then the ecological grouplevel risk is based on integrating over binary individuallevel covariates. This is suitable if the aggregate exposures are estimated using a survey of individuals in the area. If "conditional" then the binary individuallevel covariates are conditioned on, and the grouplevel risk is the normal approximation model described by Wakefield (2004). This is suitable if the aggregate exposures are estimated using a full population census. 
outcome 
Distribution of the aggregate outcome, by default
"binomial". 
gh.points 
Number of points for GaussHermite numerical integration in the random effects model. 
iter.adapt 
Number of adaptive iterations to estimate the mode and scale for GaussHermite numerical integration in the randomeffects model. 
... 
Arguments passed to 
Individual data are simply modelled by a logistic regression.
Aggregate outcomes are modelled as binomial, with arealevel risk obtained by integrating the underlying individuallevel logistic regression model over the withinarea distribution of the covariates.
The model for combined individual and aggregate data shares the same coefficients between the individual and aggregate components.
Aggregate data alone can be sufficient for inference of individuallevel relationships, provided the betweenarea variability of the exposures is large compared to the withinarea variability.
When there are several binary covariates, it is usually advisable to
account for their withinarea distribution, using cross
.
See Jackson et al. (2006,2008) for further details.
A list with components:
call 
The call to 
lik 
Minus twice the loglikelihood at the estimates. 
ors.ctx 
Matrix of estimated odds ratios and 95% confidence intervals for the arealevel covariates. 
ors.indiv 
Matrix of estimated odds ratios and 95% confidence intervals for the individuallevel covariates. 
random 
The estimated randomeffects standard deviation. 
mod 
A list of constants describing the model and data (not useful to end users). 
corrmat 
The correlation matrix of the maximum likelihood estimates (on the optimized scale, for example log odds ratios for covariates). 
C. H. Jackson chris.jackson@mrcbsu.cam.ac.uk
C. H. Jackson, N. G. Best, and S. Richardson. (2006) Improving ecological inference using individuallevel data. Statistics in Medicine, 25(12): 21362159.
C. H. Jackson, N. G. Best, and S. Richardson. (2008) Hierarchical related regression for combining aggregate and survey data in studies of socioeconomic disease risk factors. Journal of the Royal Statistical Society, Series A, 171(1):159178.
J. Wakefield. (2004) Ecological inference for 2 x 2 tables (with discussion). Journal of the Royal Statistical Society, Series A, 167(3) 385–445.
J. Wakefield and R. Salway. (2001) A statistical framework for ecological and aggregate studies. Journal of The Royal Statistical Society, Series A, 164(1):119–137, 2001.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67  ## Simulate some aggregate data and some combined aggregate and
## individual data.
ng < 50
N < rep(100, ng)
set.seed(1)
ctx < cbind(deprivation = rnorm(ng), mean.income = rnorm(ng))
phi < cbind(nonwhite = runif(ng), smoke = runif(ng))
sim.df < as.data.frame(cbind(ctx, phi))
mu < qlogis(0.05) ## Disease with approximate 5% prevalence
## Odds ratios for grouplevel deprivation and mean imcome
alpha.c < log(c(1.01, 1.02))
## Odds ratios for individuallevel ethnicity and smoking
alpha < log(c(1.5, 2))
sim1 < sim.eco(N, ctx=~deprivation+mean.income, binary=~nonwhite+smoke,
data = sim.df, mu=mu, alpha.c=alpha.c, alpha=alpha)
sim2 < sim.eco(N, ctx=~deprivation+mean.income, binary=~nonwhite+smoke,
data = sim.df, mu=mu, alpha.c=alpha.c, alpha=alpha, isam=7)
## Fit the model to recover the simulated odds ratios.
aggdata < as.data.frame(cbind(y=sim1$y, sim.df))
agg.eco < eco(cbind(y, N) ~ deprivation + mean.income,
binary = ~ nonwhite + smoke, data = aggdata)
agg.eco
## Combining with individuallevel data
## doesn't improve the precision of the estimates.
agg.indiv.eco < eco(cbind(y, N) ~ deprivation + mean.income,
binary = ~ nonwhite + smoke,
iformula = y ~ deprivation + mean.income + nonwhite + smoke,
data = aggdata, idata=sim2$idata)
agg.indiv.eco
## However, suppose we have much lower betweenarea variance in the
## mean covariate value.
phi < cbind(nonwhite = runif(ng, 0, 0.3), smoke = runif(ng, 0.1, 0.4))
sim.df < as.data.frame(cbind(ctx, phi))
sim1 < sim.eco(N, ctx=~deprivation+mean.income, binary=~nonwhite+smoke,
data = sim.df, mu=mu, alpha.c=alpha.c, alpha=alpha)
sim2 < sim.eco(N, ctx=~deprivation+mean.income, binary=~nonwhite+smoke,
data = sim.df, mu=mu, alpha.c=alpha.c, alpha=alpha, isam=10)
aggdata < as.data.frame(cbind(y=sim1$y, sim.df))
## The aggregate data now contain little information about the
## individuallevel effects, and we get biased estimates of the true
## individual model.
agg.eco < eco(cbind(y, N) ~ deprivation + mean.income,
binary = ~ nonwhite + smoke, data = aggdata)
agg.eco
## We need individuallevel data to be able to estimate the
## individuallevel effects accurately.
agg.indiv.eco < eco(cbind(y, N) ~ deprivation + mean.income,
binary = ~ nonwhite + smoke,
iformula = y ~ deprivation + mean.income + nonwhite + smoke,
data = aggdata, idata=sim2$idata)
agg.indiv.eco
## But then why not just study the individual data? Combining with
## aggregate data improves precision.
indiv.eco < eco(iformula = y ~ deprivation + mean.income + nonwhite + smoke,
idata=sim2$idata)
indiv.eco

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.