library(nlme)
library(ggplot2)
library(gridExtra)

Fitting a linear mixed model with temporal correlation in R

As mentioned at the start of the practicals, fitting a linear mixed effects model with a (subject-specific) temporal process is relatively straightforward in nlme. For further thoughts, see Ben Bolker's ideas. Throughout this practical, we will be working with the Ovary data set.

  1. Exploratory analysis:

i. You should first have a look at the data set. Typing ?Ovary gives some information. What is the response? What are the subjects, and how many are there? How many observations are there per subject?

Solution

To look at the variables and variable types, use

 str(Ovary)

The response variable is the number of follicles for each Mare with measurements taken at multiple times for each of the subjects. There are 11 subjects and between 26 and 31 measurements per subject. One way to extract this information is to use the \texttt{dplyr} library:

#library(dplyr)
#summarise(group_by(Ovary,Mare),no_rows = length(Mare))

or

table(Ovary$Mare)

ii. Now look at the data itself using a panel plot. Do you notice any patterns of trends in the response?

Solution

As previously, the panel plot can be created using xyplot as follows:

plt0 <- ggplot(data=Ovary,aes(y=follicles,x=Time, group= Mare ))
plt0+geom_point(color = "steelblue")+facet_wrap(. ~ Mare)

iii. The data seems to have some cyclic behaviour. Ignoring subject-level behaviour, a linear model which captures this behaviour has the form

\begin{align} Y_{ij} &\sim\mathrm{Normal}(\mu_{ij},\sigma^2)\ \mu_{ij} &= \beta_{0}+\beta_1\cos(2 \pi t_{ij})+\beta_2\sin(2\pi t_{ij}) \end{align}

where $i$ indexes subject and $j$ indexes within subject measurement.

What are the interpretations of $\beta_1$ and $\beta_2$ in this model? Hint you might want to think about trigonometric identities! Why do we need to multiply time by $2\pi$? How would this change if time were instead in days of the month, or days of the year?

Solution

Recalling that $R\sin(2\pi t+\alpha)=R\cos (2\pi t)\sin\alpha+R\sin (2\pi t)\cos\alpha$, we can re-write the above as $$ \beta_0+(\beta_1^2+\beta_2^2)^{1/2}\sin(2\pi t_{ij}+\tan^{-1}(\beta_1/\beta_2)) $$ from which it is clear that $\beta_1$ and $\beta_2$ determine the amplitude and the location of the seasonal cycle.

We multiply by $2\pi$ since time is normalised so that the start and end of the cycle correspond to 0 and 1 (see ?Ovary) and the trigonometric functions in R require angles to be given in radians. If we had days per month (or year) we would replace $2\pi t$ with $2\pi t/31$ ($2\pi t/365$.

iv. Fit the above model to the Ovary data; what are your parameter estimates? What is the AIC and log-likelihood?

Solution

To fit this model use

ovarylm <- lm(follicles~cos(2*pi*Time)+sin(2*pi*Time),data=Ovary) 

The parameter estimates are

ovarylm$coefficients

The AIC and likelihood are

c(AIC(ovarylm),logLik(ovarylm))
  1. You will now extend the above linear regression model to include random intercepts and random slopes.

i. Write out the statistical model formulations (not the R code) of the linear mixed effects version of this model in the cases of (I) random intercepts only and (II) a random cyclic component. Hint: you only need one random effect in each model.

Solution

The required models are \begin{align} Y_{ij} &\sim\mathrm{Normal}(\mu_{ij},\sigma^2)\ \mu_{ij} &= \beta_{0}+\beta_1\cos(2 \pi t_{ij})+\beta_2\sin(2\pi t_{ij})+b_i\ b_i&\sim\mathrm{Normal}(0,\tau^2) \end{align} and \begin{align} Y_{ij} &\sim\mathrm{Normal}(\mu_{ij},\sigma^2)\ \mu_{ij} &= \beta_{0}+\beta_1\cos(2 \pi t_{ij})+\beta_2\sin(2\pi t_{ij})+b_{1i}\cos(2\pi t_{ij})\ b_{1i}&\sim\mathrm{Normal}(0,\tau^2). \end{align} There is no need to specify random slopes for both the $\cos$ and $\sin$ terms due to the way that $\beta_1$ and $\beta_2$ combine to give both the amplitude and location of the cycle.

ii. Using the lme function, fit both of the models specified in 2 (i).

Solution

The models can be fitted as follows:

ovarylme1 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),random=~1|Mare,
                 data=Ovary,method="ML") 
ovarylme2 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),
                 random=~(-1+cos(2*pi*Time))|Mare,data=Ovary,method="ML") 

In both cases the fixed effects estimates are the same as those obtained for the equivalent linear model. The residual variance $\sigma^2$ is $3.39$ and $4.46$ respectively, and the variances of the random effects $\tau^2$ are $2.90$ and $0.000380$ (see this with summary()).

The -1 in the formula for the random effects is to prevent a random intercept being fitted.

iii. Can you combine the models fitted in part 2.2 and allow random intercepts and random cyclic components?

Solution

To do this,

ovarylme3 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),
                 random=~cos(2*pi*Time)|Mare,data=Ovary,method="ML")

iv. What are the AIC and log-likelihood values for your models? Which would you choose to carry forward?

Solution

c(AIC(ovarylme1), AIC(ovarylme2), AIC(ovarylme3))

AIC values are $1669.6$, $1805.6$ and $1664.8$;

c(logLik(ovarylme1), logLik(ovarylme2), logLik(ovarylme3))

log-likelihoods are $-829.80$, $-897.8$ and $825.4$. These suggest that the best fitting model is the model with both a random intercept and a random slope, hence we would move forward with ovarylme3.

  1. Temporal structure after fitting any fixed and random effects can be investigated using an ACF plot of the mixed model residuals. We can use the ACF function to produce the necessary plots.

i. Plot an ACF plot for the residuals of the linear mixed effects model with random effects only that was fitted in 2 (ii).

Solution

We can use the ACF function to do this in R,

plot(ACF(ovarylme1,resType="normalized"), alpha=0.05)

The normalised residuals are found by subtracting the fitted values from the observations and dividing by the residual standard deviation and then multiplying by a scaling factor which is determined by the structure of the correlation matrix implied for the errors.

ii. Comment on the results of your ACF plot. How do you know that there is a residual temporal dependence structure?

Solution

For data with no temporal (seriel) correlation, we would expect the ACF plot to be close to zero at all lags and to have no structure. This is clearly not the case here, with correlation decaying with lag. We conclude that there is some dependence structure left in the residuals, even after accounting for the cycle.

iii. Access the help page for corStruct. What correlation structures are available to you? Are all of them relevant to longitudinal data?

Solution

The key structures are the AR(1) and the ARMA$(p,q)$ processes; AR is auto-regressive and ARMA is auto-regressive moving average. Many of the correlation structures are defined for spatial data sets; since in a longitudinal data analysis we are looking for temporal correlation, such correlation structures are not relevant to us.

iv. Using the correlation argument in the lme function and your findings from 3.3, re-fit the model investigated in 3.1 to include temporal correlation with (i) an AR(1) and (ii) an ARMA(1,1) structure.

Solution

To do this,

ovarylme4 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),random=~1|Mare,
                 data=Ovary,correlation=corAR1(form=~Time|Mare),method="ML")
ovarylme5 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),random=~1|Mare,
                 data=Ovary,correlation=corARMA(form=~Time|Mare,p=1,q=1),
                 method="ML")

Estimates of both fixed effects and variances are the same for both models, but differ slightly from those of the earlier models since

ovarylme4$coefficients$fixed
ovarylme5$coefficients$fixed

The residual variance $\sigma^2$ is $3.19^2$ and the random effects variance $\tau^2$ is $2.43^2$.

If you do try the model with random intercepts, random slopes and a temporal correlation structure then you will observe an error message;

#ovarylme6 <- lme(follicles~cos(2*pi*Time)+sin(2*pi*Time),
#                 random=~cos(2*pi*Time) |Mare,
#                 data=Ovary,correlation=corAR1(),method="ML")

this suggests that the optimisation algorithm used to maximise the likelihood function has run into difficulties. This usually occurs because there are too many parameters relative to the number of measurements and so the likelihood is very flat.

v. Look at the ACF plots for the residuals of your two models; what do you conclude?

Solution

We again use the ACF function, for example

plot(ACF(ovarylme4,resType="normalized"),alpha=0.05)
plot(ACF(ovarylme5,resType="normalized"),alpha=0.05)

The plot now just shows random scatter about the horizontal axis, ie. the serial correlation structure has been removed.

vi. Of all the models fitted, what would be your preferred model for this data? Why?

Solution

The AIC values for the two models with temporal correlation are

c(AIC(ovarylme4), AIC(ovarylme5))

for the AR(1) and ARMA(1,1) structures respectively. This compares to

AIC(ovarylme1)

for the model with a random intercept but no temporal correlation structure. Inclusion of temporal correlation does appear to improve model fit, but it would seem that an AR(1) structure is sufficient. This is further justified by the similarity of the ACF plots for the AR(1) and ARMA(1,1) models.

vii. Update your panel plot from 1.2 to include the subject-level fits for your chosen model as well as the fixed effects only model. Comment on what you see.

Solution

The panel plot can be updated as follows:

Ovary2 <- Ovary  
Ovary2$fittedlm <- fitted(ovarylm)
Ovary2$fittedlme4 <- fitted(ovarylme4)

 ggplot(data=Ovary2)+
   geom_point(aes(y=follicles,x=Time, group= Mare),color = "steelblue")+ 
   geom_line(aes(y=fittedlm, x=Time, group= Mare),size=.8, linetype = "dashed")+
   geom_line(aes(y=fittedlme4, x=Time, group= Mare),size=.8)+
   facet_wrap(. ~ Mare)

The between-subject variability in the cycles fitted using the random effects model (full line) is quite evident by comparison with the population behaviour obtained when ignoring the subject-level structure (dashed lines).



premodelling/mixed documentation built on April 25, 2022, 6:27 a.m.