## Setup options(scipen=1, digits=2)
In some cases the response variable may be an average of responses from within the same subject or group. In this case, the variances for two observations having the same covariates will not be identical if the size of the groups vary. Instead, it will be inversely proportional to the size of the group. Mathematically, if the observed response is the group average $\bar{Y}i=\sum{j=1}^{n_i} Y_{ij}/n_i$ and $Y_{ij} \sim N(\mu_i,\phi_i)$ where the dispersion parameter, $\phi_i$, may vary by group and depend on covariates then
$$\bar{Y}_i \sim N(\mu_i,\phi_i/n_i).$$
This situation can be accommodated in dalmatian
by specifying a column of weights within the data frame that provides the group size. Here is an example with simulated data that demonstrates this feature and shows the importance of properly accounting for weights.
Data for this example are contained in the data frame weights.df
saved within the file weights-data-1.RData
.
## Load library library(dalmatian) ## Load simulated data data(weights_data_1) head(weights_data_1)
The three columns in the data frame record the number of responses per group (n
), the value of the covariate (x
), and the mean response (y
). The data were generated from the model
$$Y_{ij} \sim N(x_i,\exp(1+1x_i))$$ with $i=1,\ldots,100$ indexing the groups and $j$ the observations within groups. In the data, the number of observations per group ranges from r min(weights_data_1$n)
to r max(weights_data_1$n)
.
First we run the model with no weights.
## Mean model mymean=list(fixed=list(name="alpha",formula=~ x, priors=list(c("dnorm",0,.001)))) ## Variance model mydisp=list(fixed=list(name="psi", formula=~ x, priors=list(c("dnorm",0,.001))), link="log") ## Set working directory ## By default uses a system temp directory. You probably want to change this. workingDir <- tempdir() ## Define list of arguments for jags.model() jm.args = list(file=file.path(workingDir,"weights_1_jags.R"),n.adapt=1000) ## Define list of arguments for coda.samples() cs.args = list(n.iter = 5000, n.thin = 20) ## Run the model using dalmatian results1 <- dalmatian( df = weights_data_1, mean.model = mymean, dispersion.model = mydisp, jags.model.args = jm.args, coda.samples.args = cs.args, response = "y", overwrite = TRUE, debug = FALSE) ## Numerical summary statistics summary1 <- summary(results1) summary1 ## Graphical summaries caterpillar1 <- caterpillar(results1, show = TRUE)
## Extract results mean1 <- summary1$dispFixed[1,"Mean"] lower1 <- summary1$dispFixed[1,"Lower 95%"] upper1 <- summary1$dispFixed[1,"Upper 95%"]
From the summaries we can see that that the intercept in the dispersion model is being underestimated. The true value is 1, but the posterior mean is r mean1
with 95% HPD interval (r lower1
,r upper1
).
We now re-run the model including the weights.
## Specify column containing weights mydisp$weights = "n" ## Run model jm.args = list(file=file.path(workingDir,"weights_2_jags.R"),n.adapt=1000) results2 = dalmatian(df=weights_data_1, mean.model=mymean, dispersion.model=mydisp, jags.model.args=jm.args, coda.samples.args=cs.args, response="y", overwrite = TRUE, debug=FALSE) ## Numerical summary statistics summary2 <- summary(results2) summary2 ## Graphical summaries caterpillar2 <- caterpillar(results2, show = TRUE)
## Extract results mean2 <- summary2$dispFixed[1,"Mean"] lower2 <- summary2$dispFixed[1,"Lower 95%"] upper2 <- summary2$dispFixed[1,"Upper 95%"]
The new output shows that the estimate of the intercept for the variance model, r mean2
, is now very close to the truth and the 95% credible interval, (r lower2
,r upper2
) easily covers the true value of 1.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.