%\VignettePackage{doBy} %\VignetteIndexEntry{doBy: Linear estimates and LSmeans} %\VignetteIndexEntry{LSMEANS} %\VignetteIndexEntry{contrasts} %\VignetteIndexEntry{estimable functions}

\documentclass[11pt]{article}

\usepackage{hyperref,color,boxedminipage,Sweave,a4wide} \usepackage[utf8]{inputenc} \usepackage{url,hyperref} \usepackage{boxedminipage,color}

\usepackage[authoryear,round]{natbib} \bibliographystyle{plainnat}

\RequirePackage{color,fancyvrb,amsmath,amsfonts}

%% \usepackage[utf8]{inputenc} %% \usepackage[inline,nomargin,draft]{fixme} %% \newcommand\FXInline[2]{\textbf{\color{blue}#1}: %% \emph{\color{blue} - #2}} %% \usepackage[cm]{fullpage}

\def\R{\texttt{R}} \def\pkg#1{{\bf #1}} \def\doby{\pkg{doBy}}

\def\code#1{\texttt{#1}} \def\cc#1{\texttt{#1}} \def\comi#1{{\it #1}} \def\esticon{\code{esticon()}} \def\lsmeans{\code{LSmeans}} \def\linmat{\code{LE_matrix()}} \def\linest{\code{linest()}}

% reduce whitespace between R code and R output \let\oldknitrout\knitrout \renewenvironment{knitrout}{ \begin{oldknitrout} \footnotesize \topsep=0pt }{ \end{oldknitrout} }

% show R> prompt before R commands <>= options(prompt='R> ') knitr::opts_chunk$set(prompt=TRUE) @

\DeclareMathOperator{\EE}{\mathbb{E}} \DeclareMathOperator{\var}{\mathbb{V}ar} \DeclareMathOperator{\cov}{\mathbb{C}ov} \DeclareMathOperator{\norm}{N} \DeclareMathOperator{\spanx}{span} \DeclareMathOperator{\corr}{Corr} \DeclareMathOperator{\deter}{det} \DeclareMathOperator{\trace}{tr} \def\inv{^{-1}} \newcommand{\transp}{^{\top}}

<>= require( doBy ) prettyVersion <- packageDescription("doBy")$Version prettyDate <- format(Sys.Date()) @

<>= library(knitr) opts_chunk$set( tidy=FALSE,fig.path='figures/LSmeans', fig.height=3.5 ) @

\title{Linear estimates and LS--means in the \texttt{doBy} package} \author{S{\o}ren H{\o}jsgaard and Ulrich Halekoh} \date{\pkg{doBy} version \Sexpr{prettyVersion} as of \Sexpr{prettyDate}}

\begin{document} %%\SweaveOpts{concordance=TRUE}

\maketitle \hrule \tableofcontents

\parindent0pt \parskip5pt

<>= if (!dir.exists("figures")) dir.create("figures") oopt <- options() options("digits"=4, "width"=90, "prompt"="> ", "continue"=" ")

options(useFancyQuotes="UTF-8")

library(ggplot2) @ %def

%\tableofcontents %% \setkeys{Gin}{height=3.5in, width=3.5in}

%% \begin{quote} %% This is a draft; please feel free to suggest improvements. %% \end{quote}

\section{Introduction} \label{sec:introduction}

\subsection{Linear functions of parameters} \label{sec:line-funct-param}

A linear function of a $p$--dimensional parameter vector $\beta$ has the form \begin{displaymath} C=L\beta \end{displaymath} where $L$ is a $q\times p$ matrix which we call the \comi{Linear Estimate Matrix} of simply \comi{LE-matrix}. The corresponding linear estimate is $\hat C = L \hat \beta$. A linear hypothesis has the form $H_0: L\beta=m$ for some $q$ dimensional vector $m$.

\subsection{Excerpt of the CO2 data} \label{sec:excerpt-co2-data}

For illustration we use subsets of the CO2 data:

<<>>= data(CO2) CO2 <- subset(CO2, conc < 300) ## OK

names(CO2)[1:3] <- c("plant", "type", "treat")

CO2.bal <- transform(CO2,

                 ## type  = factor(abbreviate(CO2$type, 4)),
                 ## treat = factor(abbreviate(CO2$treat, 5))
                 ## )

CO2.bal <- CO2 rownames(CO2.bal) <- NULL CO2.bal %>% head @

<<>>= CO2.ubal <- CO2.bal[-c(1, 5, 12, 16, 17, 18, 19, 20, 28, 34, 35, 36),] CO2.ubal @

\begin{enumerate} \item \end{enumerate}

\begin{enumerate} \item \end{enumerate}

<<>>= xxx <- 1000 yyy <- 1000 @

\begin{enumerate} \item \end{enumerate}

\begin{enumerate} \item \end{enumerate}

<<>>= a=100000 @

\begin{enumerate} \item \end{enumerate}

<<>>= library(patchwork) p1 <- ggplot(CO2.bal, aes(x=conc, y=uptake, color=Treatment, group=Plant)) + geom_line() + geom_point() + facet_grid(~Type) @

<<>>= p1 <- ggplot(CO2.bal, aes(x=conc, y=uptake, color=Treatment, group=Plant)) + geom_line() + geom_point() + facet_grid(~Type)

p1 / p2

@

jjjjjjjjjjjjj

<<>>= form.add <- uptake ~ conc + Treatment + Type form.int <- uptake ~ conc * Treatment + Type fm.bal <- lm(form.add, data=CO2.bal) fm.ubal <- lm(form.add, data=CO2.ubal) coef(fm.bal) coef(fm.ubal) @

\subsection{Linear estimates using \texttt{predict}} \label{sec:line-estim-using}

Consider this task: Estimate the value of the response \verb|uptake| when \verb|conc=100| for each combination of \verb|Type| and \verb|Treatment|. This can be obtained, for example, as follows:

<<>>= dd <- expand.grid(Type = levels(CO2.bal$Type), Treatment = levels(CO2.bal$Treatment), conc = 100) pp <- cbind(pred=predict(fm.bal, newdata=dd), dd) pp @

% A related task: Estimate the value of the response \verb|uptake| when % \verb|conc=100| for each level of \verb|Treatment|. % Formally this question can not be answered under the additive model, % because the model gives the conditional mean of \verb|uptake| given \verb|conc|, \verb|Type| and \verb|Treatment|. %There is, so to speak, no way of getting rid of \verb|Type|. There are tow workarounds for this situation:

A workaround is to compute the average of the predictions above. That corresponds to integrating \verb|Type| out from the expectation assuming that this factor has a uniform distribution in data.

A workaround is to fit a model without \verb|Type| effect.

<<>>= fm2.bal <- update(fm.bal, .~. - Type) dd2 <- expand.grid(Treatment = levels(CO2.bal$Treatment), conc = 100) pp2 <- cbind(pred=predict(fm2.bal, newdata=dd2), dd2) pp2 pp$pred %>% split(c("nonchilled", "nonchilled", "chilled", "chilled")) %>% sapply(mean) @

The results are identical because the dataset is balanced, but in general the results are different

<<>>= pp <- cbind(pred=predict(fm.bal, newdata=dd), dd) pp fm2.ubal <- update(fm.ubal, .~. - Type) dd2 <- expand.grid(Treatment = levels(CO2.bal$Treatment), conc = 100) pp2 <- cbind(pred=predict(fm2.ubal, newdata=dd2), dd2) pp2 pp$pred %>% split(c("nonchilled", "nonchilled", "chilled", "chilled")) %>% sapply(mean) @

<<>>= L=LE_matrix(fm.ubal, effect=~Treatment, at=list(conc=100)) L linest(fm.ubal, L=L) @

<<>>= L <- matrix(c(1, 100, 0, 0, 1, 100, 0, 1, 1, 100, 1, 0, 1, 100, 1, 1), ncol=4, byrow=T) L @

<<>>= LSmeans(fm.bal, effect=~Type+Treatment, at=list(conc=100)) @

<<>>= L=LE_matrix(fm.bal, effect=~Type, at=list(conc=100)) L LSmeans(fm.bal, effect=~Type, at=list(conc=100)) @

\subsection{Tooth growth} \label{sec:tooth-growth}

% The response is the length of odontoblasts cells (cells responsible for % tooth growth) in 60 guinea pigs. Each animal received one of % three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of % two delivery methods, (orange juice (coded as OJ) or ascorbic acid % (a form of vitamin C and (coded as VC)).

% <<>>= % head(ToothGrowth, 4) % ftable(xtabs(~ dose + supp, data=ToothGrowth)) % @ %def

% <>= % opar <- par(mfrow = c(1, 2), oma = c(0, 0, 1.1, 0)) % plot(len ~ dose, data = ToothGrowth, col = "lightgray", % subset = supp == "OJ", main = "supp=OJ") % plot(len ~ dose, data = ToothGrowth, col = "lightgray", % subset = supp == "VC", main = "supp=VC") % mtext("ToothGrowth data", side = 3, outer = TRUE) % par(opar) % @ %def

% <<>>= % ToothGrowth %>% interaction_plot(len ~ dose + supp) % @

% The interaction plot suggests a mild interaction which is supported by a formal comparison:

<< >>= ToothGrowth$dose <- factor(ToothGrowth$dose) head(ToothGrowth) tooth1 <- lm(len ~ dose + supp, data=ToothGrowth) tooth2 <- lm(len ~ dose * supp, data=ToothGrowth) anova(tooth1, tooth2) @

%% <>= %% tooth_avglen <- ToothGrowth %>% %% group_by(dose, supp) %>% %% summarise(val = mean(len)) %% tooth_avglen %% ggplot(ToothGrowth, aes(x = factor(dose), y = len, colour = supp)) + %% geom_boxplot(outlier.shape = 4) + %% geom_point(data = tooth_avglen, aes(y = val)) + %% geom_line(data = tooth_avglen, aes(y = val, group = supp)) %%@

\section{Computing linear estimates} \label{sec:comp-line-estim}

For now, we focus on the additive model: << >>= tooth1 @

Consider computing the estimated length for each dose of orange juice (OJ): One option: Construct the LE--matrix $L$ directly:

<< >>= L <- matrix(c(1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0), nrow=3, byrow=T) @

Then do:

<< >>= c1 <- linest(tooth1, L) c1 @

We can do:

<< >>= summary(c1) coef(c1) confint(c1) @

\subsection{Automatic generation of $L$} \label{sec:autom-gener-l}

The matrix $L$ can be generated as follows: << >>= L <- LE_matrix(tooth1, effect="dose", at=list(supp="OJ")) L @

\subsection{Alternatives - \texttt{esticon()}} \label{sec:alternatives}

An alternative is to do:

<< >>= c1 <- esticon(tooth1, L) c1 @

Notice: \verb|esticon| has been in the \doby\ package for many years; \verb|linest| is a newer addition; \verb|esticon| is not actively maintained but remains in \doby\ for historical reasons. Yet another alternative in this case is to generate a new data frame and then invoke predict (but this approach is not generally applicable, see later):

<< >>= nd <- data.frame(dose=c('0.5', '1', '2'), supp='OJ') nd predict(tooth1, newdata=nd) @

\section{Least-squares means (LS--means)} \label{sec:least-squares-means}

A related question could be: What is the estimated length for each dose if we ignore the source of vitamin C (i.e.\ whether it is OJ or VC). One approach would be to fit a model in which source does not appear:

<< >>= tooth0 <- update(tooth1, . ~ . - supp) L0 <- LE_matrix(tooth0, effect="dose") L0 linest(tooth0, L=L0) @

An alternative would be to stick to the original model but compute the estimate for an ``average vitamin C source''. That would correspond to giving weight $1/2$ to each of the two vitamin C source parameters. However, as one of the parameters is already set to zero to obtain identifiability, we obtain the LE--matrix $L$ as

<< >>= L1 <- matrix(c(1, 0, 0, 0.5, 1, 1, 0, 0.5, 1, 0, 1, 0.5), nrow=3, byrow=T) linest(tooth1, L=L1) @

Such a particular linear estimate is sometimes called a least-squares mean or an LSmean or a marginal mean. Notice that the parameter estimates under the two approaches are identical. This is is because data is balanced: There are $10$ observations per supplementation type. Had data not been balanced, the estimates would in general have been different.

Notice: One may generate $L$ automatically with << >>= L1 <- LE_matrix(tooth1, effect="dose") L1 @

Notice: One may obtain the LSmean directly as:

<< >>= LSmeans(tooth1, effect="dose") @

%% \subsection{Additive model} %% \label{sec:additive-model}

%% Returning to the \verb|ToothGrowth| data, orange juice and ascorbic %% acid are just two of many ways of supplying vitamin C (citrus and lime %% juice would be two alternatives). Here one can therefore argue, that %% it would make sense to estimate the the effect for each dose for an %% ``average vitamin C source'':

%% << >>= %% LSmeans(tooth1, effect="dose") %% @

which is the same as

<>= L <- LE_matrix(tooth1, effect="dose") le <- linest(tooth1, L=L) coef(le) @

%% \subsection{Interaction model} %% \label{sec:interaction-model}

For a model with interactions, the LSmeans are << >>= LSmeans(tooth2, effect="dose") @

In this case, the LE--matrix is << >>= L <- LE_matrix(tooth2, effect="dose") t(L) @

\subsection{Using the \code{at=} argument} \label{sec:at-argument}

<>= library(ggplot2) ChickWeight$Diet <- factor(ChickWeight$Diet) qplot(Time, weight, data=ChickWeight, colour=Chick, facets=~Diet, geom=c("point","line")) + theme(legend.position="none") @ %def

Consider random regression model: <<>>= library(lme4) chick <- lmer(weight ~ Time * Diet + (0 + Time | Chick), data=ChickWeight) coef(summary(chick)) @ %def

The LE--matrix for \cc{Diet} becomes: <<>>= L <- LE_matrix(chick, effect="Diet") t(L) @ %def

The value of \cc{Time} is by default taken to be the average of that variable. Hence the \lsmeans\ is the predicted weight for each diet at that specific point of time. We can consider other points of time with <<>>= K1 <- LE_matrix(chick, effect="Diet", at=list(Time=1)) t(K1) @ %def

The \lsmeans\ for the intercepts is the predictions at \cc{Time=0}. The \lsmeans\ for the slopes becomes <<>>= K0 <- LE_matrix(chick, effect="Diet", at=list(Time=0)) t(K1 - K0) linest(chick, L=K1 - K0) @ %def

% We may be interested in finding the population means % at all levels of $A$ % but only at $C=1$. This is obtained by using the \code{at} argument % (in which case the average is only over the remaining factor $B$):

% <<>>= % popMatrix(mm, effect='A', at=list(C='1')) % @ %def

% Another way of % creating the population means % at all levels of $(A,C)$ is therefore

% <<>>= % popMatrix(mm, effect='A', at=list(C=c('1','2'))) % @ %def

% We may have several variables in the \code{at} argument: % @ % <<>>= % popMatrix(mm, effect='A', at=list(C=c('1','2'), B='1')) % @ %def

We can create our own function for comparing trends:

<<>>= LSmeans_trend <- function(object, effect, trend){ L <- LE_matrix(object, effect=effect, at=as.list(setNames(1, trend))) - LE_matrix(object, effect=effect, at=as.list(setNames(0, trend))) linest(object, L=L) } LSmeans_trend(chick, effect="Diet", trend="Time") @ %def

\subsection{Ambiguous specification when using the \texttt{effect} and \texttt{at} arguments}

% There is room for an ambiguous specification if a variable appears in % both the \code{effect} and the \code{at} argument, such as % @ % <<>>= % popMatrix(mm, effect=c('A','C'), at=list(C='1')) % @ %def

% This ambiguity is due to the fact that the \verb+effect+ argument asks % for the populations means at all levels of the variables but the % \verb+at+ chooses only specific levels.

% This ambiguity is resolved as follows: Any variable in the \code{at} % argument is removed from the \code{effect} argument such as the % statement above is equivalent to % @ % <>= % ## popMatrix(mm, effect='A', at=list(C='1')) % @ %def

\subsection{Using (transformed) covariates} \label{sec:using-covariates}

Consider the following subset of the \code{CO2} dataset:

<<>>= data(CO2) CO2 <- transform(CO2, Treat=Treatment, Treatment=NULL) levels(CO2$Treat) <- c("nchil", "chil") levels(CO2$Type) <- c("Que", "Mis") ftable(xtabs( ~ Plant + Type + Treat, data=CO2), col.vars=2:3) @ %def

<>= qplot(x=log(conc), y=uptake, data=CO2, color=Treat, facets=~Type) @ %def

Below, the covariate \code{conc} is fixed at the average value: <<>>= co2.lm1 <- lm(uptake ~ conc + Type + Treat, data=CO2) LSmeans(co2.lm1, effect="Treat") @ %def

If we use \code{log(conc)} instead we will get an error when calculating LS--means: <>= co2.lm <- lm(uptake ~ log(conc) + Type + Treat, data=CO2) LSmeans(co2.lm, effect="Treat") @ %def

In this case one can do <<>>= co2.lm2 <- lm(uptake ~ log.conc + Type + Treat, data=transform(CO2, log.conc=log(conc))) LSmeans(co2.lm2, effect="Treat") @ %def

This also highlights what is computed: The average of the log of \code{conc}; not the log of the average of \code{conc}. In a similar spirit consider:

<<>>= co2.lm3 <- lm(uptake ~ conc + I(conc^2) + Type + Treat, data=CO2) LSmeans(co2.lm3, effect="Treat") @ %def

Above \verb'I(conc^2)' is the average of the squared values of \code{conc}; not the square of the average of \code{conc}, cfr.\ the following.

<<>>= co2.lm4 <- lm(uptake ~ conc + conc2 + Type + Treat, data= transform(CO2, conc2=conc^2)) LSmeans(co2.lm4, effect="Treat") @ %def

If we want to evaluate the LS--means at \code{conc=10} then we can do: <<>>= LSmeans(co2.lm4, effect="Treat", at=list(conc=10, conc2=100)) @ %def

\section{Alternative models} \label{sec:alternative-models}

\subsection{Generalized linear models} \label{sec:gener-line-models}

We can calculate LS--means for e.g.\ a Poisson or a gamma model. Default is that the calculation is calculated on the scale of the linear predictor. However, if we think of LS--means as a prediction on the linear scale one may argue that it can also make sense to transform this prediction to the response scale:

<<>>= tooth.gam <- glm(len ~ dose + supp, family=Gamma, data=ToothGrowth) LSmeans(tooth.gam, effect="dose", type="link") LSmeans(tooth.gam, effect="dose", type="response") @ %def

\subsection{Linear mixed effects model} \label{sec:linear-mixed-effects}

For the sake of illustration we treat \verb|supp| as a random effect:

<< >>= library(lme4) tooth.mm <- lmer( len ~ dose + (1|supp), data=ToothGrowth) LSmeans(tooth1, effect="dose") LSmeans(tooth.mm, effect="dose") @

Notice here that the estimates themselves identical to those of a linear model (that is not generally the case, but it is so here because data is balanced). In general the estimates are will be very similar but the standard errors are much larger under the mixed model. This comes from that there that \code{supp} is treated as a random effect.

<<>>= VarCorr(tooth.mm) @ %def

Notice that the degrees of freedom by default are adjusted using a Kenward--Roger approximation (provided that \pkg{pbkrtest} is installed). Unadjusted degrees of freedom are obtained by setting \verb|adjust.df=FALSE|. <>= LSmeans(tooth.mm, effect="dose", adjust.df=FALSE) @ %def

% Notice that the degrees of freedom by default are adjusted using a % Kenward--Roger approximation (provided that \pkg{pbkrtest} is % installed). Unadjusted degrees of freedom are obtained with % <<>>= % LSmeans(warp.mm, effect="tension", adjust.df=FALSE) % @ %def

\subsection{Generalized estimating equations} \label{sec:gener-estim-equat}

Lastly, for gee-type ``models'' we get <<>>= library(geepack) tooth.gee <- geeglm(len ~ dose, id=supp, family=Gamma, data=ToothGrowth) LSmeans(tooth.gee, effect="dose") LSmeans(tooth.gee, effect="dose", type="response") @ %def

% Lastly, for gee-type ``models'' we get % <<>>= % library(geepack) % warp.gee <- geeglm(breaks ~ tension, id=wool, family=poisson, data=warpbreaks) % LSmeans(warp.gee, effect="tension") % LSmeans(warp.gee, effect="tension", type="response") % @ %def

\section{Miscellaneous} \label{sec:miscellaneous}

% \subsection{Under the hood} % \label{sec:under-hood}

% Under the hood, \cc{LSmeans()} generates a contrast matrix % <<>>= % K <- LE_matrix(warp.lm, effect="tension"); K % @ %def % and passes this matrix onto \linest: % <<>>= % linest( warp.lm, K=K ) % @ %def

\subsection{Example: Non--estimable linear functions} \label{sec:exampl-non-estim}

<<>>=

Make balanced dataset

dat.bal <- expand.grid(list(AA=factor(1:2), BB=factor(1:3), CC=factor(1:3))) dat.bal$y <- rnorm(nrow(dat.bal))

Make unbalanced dataset: 'BB' is nested within 'CC' so BB=1

is only found when CC=1 and BB=2,3 are found in each CC=2,3,4

dat.nst <- dat.bal dat.nst$CC <-factor(c(1,1,2,2,2,2,1,1,3,3,3,3,1,1,4,4,4,4)) @ %def

<< >>= dat.nst @

Consider this simulated dataset: <<>>= head(dat.nst, 4) ftable(xtabs( ~ AA + BB + CC, data=dat.nst), row.vars="AA") @ %def

Data is highly "unbalanced": Whenever \verb|BB=1| then \verb|CC| is always \verb|1|; whenever \verb|BB| is not \verb|1| then \verb|CC| is never \verb|1|. We have <<>>= mod.nst <- lm(y ~ AA + BB : CC, data=dat.nst) coef(summary(mod.nst)) @ %def

In this case some of the \lsmeans\ values are not estimable; for example: <<>>= lsm.BC <- LSmeans(mod.nst, effect=c("BB", "CC")) lsm.BC lsm.BC2 <- LSmeans(mod.nst, effect="BB", at=list(CC=2)) lsm.BC2 @ %def

We describe the situation in Section~\ref{sec:handl-non-estim} where we focus on \verb|lsm.BC2|.

\subsection{Handling non--estimability} \label{sec:handl-non-estim}

The model matrix for the model in Section~\ref{sec:exampl-non-estim} does not have full column rank and therefore not all values are calculated by \cc{LSmeans()}.

<<>>= X <- model.matrix( mod.nst ) Matrix::rankMatrix(X) dim(X) as(X, "Matrix") @ %def

We consider a model, i.e.\ an $n$ dimensional random vector $y=(y_i)$ for which $\EE(y)=\mu=X\beta$ and $\cov(y)=V$ where $X$ does not have full column rank We are interested in linear functions of $\beta$, say \begin{displaymath} c=l\transp\beta= \sum_j l_j \beta_j . \end{displaymath}

<<>>= L <- LE_matrix(mod.nst, effect="BB", at=list(CC=2)) t(L) linest(mod.nst, L=L) @ %def

A least squares estimate of $\beta$ is \begin{displaymath} \hat \beta = G X\transp y \end{displaymath}

where $G$ is a generalized inverse of $X\transp X$. Since the generalized inverse is not unique then neither is the estimate $\hat\beta$. Hence $\hat c = l\transp\hat \beta$ is in general not unique.

One least squares estimate of $\beta$ and one corresponding linear estimate $L\hat\beta$ is: <<>>= XtXinv <- MASS::ginv(t(X)%%X) bhat <- as.numeric(XtXinv %% t(X) %% dat.nst$y) zapsmall(bhat) L %% bhat @ %def

For some values of $l$ (i.e.\ for some rows of $L$) the estimate $\hat c=l\transp \beta$ is unique (i.e.\ it does not depend on the choice of generalized inverse). Such linear functions are said to be estimable and can be described as follows:

All we specify with $\mu=X\beta$ is that $\mu$ is a vector in the column space $C(X)$ of $X$. We can only learn about $\beta$ through $X\beta$ so the only thing we can say something about is linear combinations $\rho\transp X\beta$. Hence we can only say something about $l\transp\beta$ if there exists $\rho$ such that $$ l\transp\beta=\rho\transp X \beta, $$ i.e., if $l=X\transp\rho$ for some $\rho$, which is if $l$ is in the column space $C(X\transp)$ of $X\transp$. This is the same as saying that $l$ must be perpendicular to all vectors in the null space $N(X)$ of $X$. To check this, we find a basis $B$ for $N(X)$. This can be done in many ways, for example via a singular value decomposition of $X$, i.e.\ \begin{displaymath} X = U D V\transp \end{displaymath} A basis for $N(X)$ is given by those columns of $V$ that corresponds to zeros on the diagonal of $D$.

<<>>= S <- svd(X) B <- S$v[, S$d < 1e-10, drop=FALSE ]; head(B) ## Basis for N(X) @ %def

From << >>= rowSums(L %*% B) @ we conclude that the first row of $L$ is not perpendicular to all vectors in thenull space $N(X)$ whereas the two last rows of $L$ are. Hence these two linear estimates are estimable; their value does not depend on the choice of generalized inverse: << >>= lsm.BC2 @

\subsection{Pairwise comparisons} \label{sec:pairwise-comparisons}

We will just mention that for certain other linear estimates, the matrix $L$ can be generated automatically using \cc{glht()} from the \pkg{multcomp} package. For example, pairwise comparisons of all levels of \code{dose} can be obtained with

<< >>= library("multcomp") g1 <- glht(tooth1, mcp(dose="Tukey")) summary( g1 ) @

The L matrix is << >>= L <- g1$linfct L @ and this matrix can also be supplied to \verb|glht| <>= glht(tooth1, linfct=L) @

% <<>>= % library("multcomp") % g1 <- glht(warp.lm, mcp(tension="Tukey")) % summary( g1 ) % @ %def

% The $K$ matrix generated in this case is: % <<>>= % K1 <- g1$linfct; K1 % @ %def

%%% -------------------------------------------------------------------- %%% from half.Rnw %%% --------------------------------------------------------------------

\section{LSmeans (population means, marginal means)} \label{sec:xxx}

\subsection{A simulated dataset} \label{sec:simulated-dataset}

In the following sections we consider these data:

<<>>= library(doBy) dd <- expand.grid(A=factor(1:3),B=factor(1:3),C=factor(1:2)) dd$y <- rnorm(nrow(dd)) dd$x <- rnorm(nrow(dd))^2 dd$z <- rnorm(nrow(dd)) head(dd,10) @ %def

Consider the additive model \begin{equation} \label{eq:1} y_i = \beta_0 + \beta^1_{A(i)}+\beta^2_{B(i)} + \beta^3_{C(i)} + e_i \end{equation} where $e_i \sim N(0,\sigma^2)$. We fit this model:

<<>>= mm <- lm(y~A+B+C, data=dd) coef(mm) @ %def

Notice that the parameters corresponding to the factor levels \code{A1}, \code{B1} and \code{C2} are set to zero to ensure identifiability of the remaining parameters.

\subsection{What are these quantities} \label{sec:what-are-these}

LSmeans, population means and marginal means are used synonymously in the literature. These quantities are a special kind of contrasts as defined in Section~\ref{sec:line-funct-param}. LSmeans seems to be the most widely used term, so we shall adopt this terms here too.

The model (\ref{eq:1}) is a model for the conditional mean $\EE(y|A,B,C)$. Sometimes one is interested in quantities like $\EE(y|A)$. This quantity can not formally be found unless $B$ and $C$ are random variables such that we may find $\EE(y|A)$ by integration. However, suppose that $A$ is a treatment of main interest, $B$ is a blocking factor and $C$ represents days on which the experiment was carried out. Then it is tempting to average $\EE(y|A,B,C)$ over $B$ and $C$ (average over block and day) and think of this average as $\EE(y|A)$.

% \subsection{A brute--force calculation} % \label{sec:xxx}

The population mean for $A=1$ is \begin{equation} \label{eq:2} \beta^0 + \beta^1_{A1} + \frac{1}{3} (\beta^2_{B1}+\beta^2_{B2}+\beta^2_{B3}) + \frac{1}{2}(\beta^3_{C1}+\beta^3_{C2}) \end{equation}

Recall that the parameters corresponding to the factor levels \code{A1}, \code{B1} and \code{C2} are set to zero to ensure identifiability of the remaining parameters. Therefore we may also write the population mean for $A=1$ as \begin{equation} \label{eq:3} \beta^0 + \frac{1}{3} (\beta^2_{B2}+\beta^2_{B3}) + \frac{1}{2}(\beta^3_{C2}) \end{equation}

This quantity can be estimated as:

@ <<>>= w <- c(1, 0, 0, 1/3, 1/3, 1/2) coef(mm)w sum(coef(mm)w) @ %def

We may find the population mean for all three levels of $A$ as

<<>>= W <- matrix(c(1, 0, 0, 1/3, 1/3, 1/2, 1, 1, 0, 1/3, 1/3, 1/2, 1, 0, 1, 1/3, 1/3, 1/2),nr=3, byrow=TRUE) @ %def

Notice that the matrix W is based on that the first level of $A$ is set as the reference level. If the reference level is changed then so must $W$ be.

% \subsection{Using \esticon} % \label{sec:xxx}

Given that one has specified $W$, we can use the \esticon\ function in the \code{doBy} as illustrated below:

<<>>= esticon(mm, W) @ %def

\def\popmatrix{POPMATRIX} \def\popmeans{POPMEANS}

<<>>= popMatrix <- LE_matrix popMeans <- LSmeans @

\subsection{Using \popmatrix\ and \popmeans} \label{sec:xxx}

Writing the matrix $W$ is somewhat tedious and hence error prone. In addition, there is a potential risk of getting the wrong answer if the the reference level of a factor has been changed. The \popmatrix\ function provides an automated way of generating such matrices. The above \verb+W+ matrix is constructed by

<<>>= pma <- popMatrix(mm,effect='A') summary(pma) @ %def

The \verb+effect+ argument requires to calculate the population means for each level of $A$ aggregating across the levels of the other variables in the data.

The \popmeans\ function is simply a wrapper around first a call to \popmatrix\ followed by a call to (by default) \esticon:

<<>>= pme <- popMeans(mm, effect='A') pme @ %def

More details about how the matrix was constructed is provided by the \code{summary()} function:

<<>>= summary(pme) @ %def

As an additional example we may do:

<<>>= popMatrix(mm,effect=c('A','C')) @ %def This gives the matrix for calculating the estimate for each combination of \code{A} and \code{C} when averaging over \code{B}.

Omitting \code{effect} as in

<<>>= popMatrix(mm) popMeans(mm) @ %def gives the ``total average''.

% \subsection{Using the \code{at} argument}

% We may be interested in finding the population means % at all levels of $A$ % but only at $C=1$. This is obtained by using the \code{at} argument % (in which case the average is only over the remaining factor $B$):

% <<>>= % popMatrix(mm, effect='A', at=list(C='1')) % @ %def

% Another way of % creating the population means % at all levels of $(A,C)$ is therefore

% <<>>= % popMatrix(mm, effect='A', at=list(C=c('1','2'))) % @ %def

% We may have several variables in the \code{at} argument: % @ % <<>>= % popMatrix(mm, effect='A', at=list(C=c('1','2'), B='1')) % @ %def

% \subsection{Ambiguous specification when using the \texttt{effect} and % \texttt{at} arguments}

% There is room for an ambiguous specification if a variable appears in % both the \code{effect} and the \code{at} argument, such as % @ % <<>>= % popMatrix(mm, effect=c('A','C'), at=list(C='1')) % @ %def

% This ambiguity is due to the fact that the \verb+effect+ argument asks % for the populations means at all levels of the variables but the % \verb+at+ chooses only specific levels.

% This ambiguity is resolved as follows: Any variable in the \code{at} % argument is removed from the \code{effect} argument such as the % statement above is equivalent to % @ % <>= % ## popMatrix(mm, effect='A', at=list(C='1')) % @ %def

% \subsection{Using covariates}

% Next consider the model where a covariate is included: % @ % <<>>= % mm2 <- lm(y~A+B+C+C:x, data=dd) % coef(mm2) % @ %def

% In this case we get % <<>>= % popMatrix(mm2,effect='A', at=list(C='1')) % @ %def

% Above, $x$ has been replaced by its average and that is the general % rule for models including covariates. However we may use the \code{at} % argument to ask for calculation of the population mean at some % user-specified value of $x$, say 12: % <<>>= % popMatrix(mm2,effect='A', at=list(C='1',x=12)) % @ %def

% <<>>= % mm22 <- lm(y~A+B+C+C:x+I(x^2), data=dd) % coef(mm22) % @ %def

% <<>>= % popMatrix(mm22,effect='A', at=list(C='1')) % @ %def

% <<>>= % dd <- transform(dd, x.sq=x^2) % mm23 <- lm(y~A+B+C+C:x+x.sq, data=dd) % coef(mm23) % popMatrix(mm23,effect='A', at=list(C='1')) % @ %def

% \subsection{Using transformed covariates}

% Next consider the model where a transformation of a covariate is included: % @ % <<>>= % mm3 <- lm(y~A+B+C+C:I(log(x)), data=dd) % coef(summary(mm3)) % @ %def

% In this case we can not use \popmatrix\ (and hence % \popmeans) directly. Instead we first have to % generate a new variable, say \verb+log.x+, with % \verb+log.x+$=\log(x)$, in the data and then proceed as

% <<>>= % dd <- transform(dd, log.x = log(x)) % mm32 <- lm(y~A+B+C+C:log.x, data=dd) % popMatrix(mm32, effect='A', at=list(C='1')) % @ %def

% \subsection{The \code{engine} argument of \popmeans}

% The \popmatrix is a function to generate a linear tranformation matrix of the model % parameters with emphasis on constructing such matrices for population % means. % \popmeans\ invokes by default the \esticon\ function on this % linear transformation matrix for calculating parameter estimates and % confidecne intervals. % A similar function to \esticon\ is the \verb+glht+ function of the \verb+multcomp+ % package.

% The \code{glht()} function % can be chosen via the \verb+engine+ argument of \popmeans:

% <<>>= % library(multcomp) % g<-popMeans(mm,effect='A', at=list(C='1'),engine="glht") % g % @ %def

% This allows to apply the methods available on the \verb+glht+ object like

% <<>>= % summary(g,test=univariate()) % confint(g,calpha=univariate_calpha()) % @ % which yield the same results as the \esticon\ function.

% By default the functions will adjust the tests and confidence intervals for multiplicity % <<>>= % summary(g) % confint(g) % @

%%\bibliography{doBy}

\end{document}

% A special type of linear estimates is the so called least--squares % means (or LS--means). Other names for these estimates include % population means and marginal means. Consider an imaginary field % experiment analyzed with model of the form % <>= % lm( y ~ treat + block + year) % @ %def % where \cc{treat} is a treatment factor, \cc{block} is a blocking % factor and \cc{year} is the year (a factor) where the experiment is % repeated over several years. This model specifies the conditional mean % $\EE(Y|\cc{treat}, \cc{block},\cc{year})$. One may be interested in % predictions of the % form $\EE(Y|\cc{treat})$. This quantity can not formally be found from the % model. However, it is tempting to average the fitted values of % $\EE(Y|\cc{treat}, \cc{block},\cc{year})$ across the levels of % \cc{block} and \cc{year} and think of this average as % $\EE(Y|\cc{treat})$. This average is precisely what is called the % LS--means. If the experiment is balanced then this average is % identical to the average of the observations when stratified according % to \cc{treat}.

% An alternative is to think of \cc{block} and \cc{year} as random % effects, for example: % <>= % library(lme4) % lmer( y ~ treat + (1|block) + (1|year)) % @ %def

% In this case one would directly obtain $\EE(Y|\cc{treat})$ from the % model. However, there are at least two reasons why one may be hesitant % to consider such a random effects model. % \begin{itemize} % \item Suppose there are three % blocks and the experiment is repeated over three consecutive % years. This means that the random effects are likely to be estimated % with a large uncertainty (the estimates will have only two degrees of % freedom). % \item Furthermore, treating \cc{block} and \cc{year} as random % effects means they should in principle come from a large population of % possible blocks and years. This may or may not be reasonable for the % blocks, but it is certainly a dubious assumption for the years. % \end{itemize}

% \section{LS--means for linear models} % \label{sec:linear-model}

% \subsection{LS--means -- a first example} % \label{sec:ls-means-population}

% <>= % simdat<-structure(list(treat = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L % ), .Label = c("t1", "t2"), class = "factor"), year = structure(c(1L, % 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), % y = c(0.5, 1, 1.5, 3, 3, 4.5, 5, 5.5)), .Names = c("treat", "year", % "y"), row.names = c(NA, -8L), class = "data.frame") % @ %def

% Consider these simulated data % <<>>= % simdat % @ %def % shown in the figure below. % <>= % library(ggplot2) % qplot(treat, y, data=simdat, color=year, size=I(3)) % @ %def % \includegraphics[height=6cm,width=12cm]{figures/LSmeans-simdat-fig}

% The LS--means under an additive model for the factor \cc{treat} is % <<>>= % msim <- lm(y ~ treat + year, data=simdat) % LSmeans( msim, effect="treat") % @ %def % whereas the population means are % <<>>= % summaryBy(y~treat, data=simdat, FUN=mean) % @ %def % Had data been balanced (same number of observations for each % combination of \cc{treat} and \cc{year}) the results would have been the % same. An argument in favor of the LS--means is that these figures % better represent what one would expect on in an ``average year''.

% \subsection{Example: Warpbreaks} % \label{sec:example:-warpbreaks}

% <<>>= % summary( warpbreaks ) % head( warpbreaks, 4 ) % ftable(xtabs( ~ wool + tension, data=warpbreaks)) % @ %def

% <>= % opar <- par(mfrow = c(1, 2), oma = c(0, 0, 1.1, 0)) % plot(breaks ~ tension, data = warpbreaks, col = "lightgray", % varwidth = TRUE, subset = wool == "A", main = "Wool A") % plot(breaks ~ tension, data = warpbreaks, col = "lightgray", % varwidth = TRUE, subset = wool == "B", main = "Wool B") % mtext("warpbreaks data", side = 3, outer = TRUE) % par(opar) % @ %def

% <<>>= % (warp.lm <- lm(breaks ~ wool + tension, data=warpbreaks)) % @ %def

% The fitted values are: % <<>>= % uni <- unique(warpbreaks[,2:3]) % prd <- cbind(breaks=predict(warp.lm, newdata=uni), uni); prd % @ %def

% \subsection{The LS--means} % \label{sec:lsmeans}

% We may be interested in making predictions of the number of breaks for % each level of \cc{tension} for \emph{any} type or an \emph{average} % type of \code{wool}. The idea behind LS--means is % to average the predictions above over the two % wool types. These quantities are the \lsmeans\ for the effect % \cc{tension}.

% This is done with: % <<>>= % LSmeans(warp.lm, effect="tension") % @ %def

% The term \lsmeans\ comes from that these quantities are the same as % the least squares main effects of \cc{tension} when data is balanced: % <<>>= % doBy::summaryBy(breaks ~ tension, data=warpbreaks, FUN=mean) % @ %def % When data is not balanced these quantities are in general not the same.

% \subsection{LS--means for models with interactions} % \label{sec:models-with-inter}

% Consider a model with interaction: % <<>>= % warp.lm2 <- update(warp.lm, .~.+wool:tension) % coef( summary( warp.lm2 )) % @ %def

% In this case the contrast matrix becomes: % <<>>= % K2 <- LE_matrix(warp.lm2, effect="tension"); K2 % linest(warp.lm2, K=K2) % @ %def

% For the sake of illustration we treat \cc{wool} as a random effect:

% <<>>= % library(lme4) % warp.mm <- lmer(breaks ~ tension + (1|wool), data=warpbreaks) % LSmeans(warp.mm, effect="tension") % @ %def

% Notice here that the estimates themselves are very similar to those % above but the standard errors are much larger. This comes from that % there that \code{wool} is treated as a random effect.

% <<>>= % VarCorr(warp.mm) % @ %def

%\usepackage{framed} %\usepackage{comment} %%\definecolor{shadecolor}{gray}{0.91}

%%\definecolor{darkred}{rgb}{.7,0,0} %%\definecolor{midnightblue}{rgb}{0.098,0.098,0.439}

%% \DefineVerbatimEnvironment{Sinput}{Verbatim}{ %% fontfamily=tt, %% %%fontseries=b, %% %% xleftmargin=2em, %% formatcom={\color{midnightblue}} %% } %% \DefineVerbatimEnvironment{Soutput}{Verbatim}{ %% fontfamily=tt, %% %%fontseries=b, %% %% xleftmargin=2em, %% formatcom={\color{blue}} %% } %% \DefineVerbatimEnvironment{Scode}{Verbatim}{ %% fontfamily=tt, %% %%fontseries=b, %% %% xleftmargin=2em, %% formatcom={\color{blue}} %% } %% %%\fvset{listparameters={\setlength{\topsep}{-2pt}}} %\renewenvironment{Schunk}{\linespread{.90}}{}

%%\renewenvironment{Schunk}{\begin{shaded}\small}{\end{shaded}}

%% A special type of linear estimates is the so called least--squares %% means (or LS--means). Other names for these estimates include %% population means and marginal means. Notice that the \pkg{lsmeans} package %% \cite{Lenth:2013} also provides computations of LS--means, see %% \url{http://cran.r-project.org/web/packages/lsmeans/}.

%% <>= %% simdat<-structure(list(treat = structure(c(1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L %% ), .Label = c("t1", "t2"), class = "factor"), year = structure(c(1L, %% 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), %% y = c(0.5, 1, 1.5, 3, 3, 4.5, 5, 5.5)), .Names = c("treat", "year", %% "y"), row.names = c(NA, -8L), class = "data.frame") %% @ %def

%% \subsection{LS--means on a simple example} %% \label{sec:ls-means-simple}

%% Consider these simulated data, also shown in Fig.~\ref{fig:simdat2-fig}: %% <<>>= %% simdat %% @ %def

%% <>= %% library(ggplot2) %% qplot(treat, y, data=simdat, color=year) %% @ %def

%% The LS--means under an additive model for the factor \cc{treat} is the predicted outcome for each level of \cc{treat} for an ``average year'': %% <<>>= %% msim <- lm(y ~ treat + year, data=simdat) %% lsm <- LSmeans(msim, effect="treat") %% lsm %% @ %def

%% Notice that the population means are %% <<>>= %% summaryBy(y ~ treat, data=simdat, FUN=function(x) c(m=mean(x), s=sd(x))) %% ## or aggregate(y ~ treat, data=simdat, FUN=function(x) c(m=mean(x), s=sd(x))) %% @ %def

%% Had data been balanced (same number of observations for each %% combination of \cc{treat} and \cc{year}) the results would have been the %% same. An argument in favor of the LS--means is that these figures %% better represent what one would expect on in an ``average year''.

%% An alternative is to think of \cc{year} as a random %% effect, for example: %% <>= %% library(lme4) %% lmer( y ~ treat + (1|year), data=simdat) %% @ %def

%% In this case one would directly obtain $\EE(Y|\cc{treat})$ from the %% model. However, there are at least two reasons why one may be hesitant %% to consider such a random effects model. %% \begin{itemize} %% \item It is a ``never ending'' discussion whether \verb|year| should %% be treated as fixed or random. For \verb|year| to be a random %% effect, it should in principle come from a large population of %% possible years. This is certainly a dubious assumption for the %% years. %% \item If we accept to think of year as a random effect, then the %% variance describing this random effect will be poorly determined %% (because there are only two years in the study). %% \end{itemize}

% <<>>= % warp.poi <- glm(breaks ~ wool + tension, family=poisson, data=warpbreaks) % LSmeans(warp.poi, effect="tension", type="link") % LSmeans(warp.poi, effect="tension", type="response") % @ %def

%% SANITY CHECK %% @ %% <<>>= %% K <- LE_matrix(warp.poi, effect="tension") %% doBy::esticon(warp.poi, K) %% @ %def

% <<>>= % warp.qpoi <- glm(breaks ~ wool + tension, family=quasipoisson, data=warpbreaks) % LSmeans(warp.qpoi, effect="tension", type="link") % LSmeans(warp.qpoi, effect="tension", type="response") % @ %def

% For comparison with the linear model, we use identity link % <>= % warp.poi2 <- glm(breaks ~ wool + tension, family=poisson(link=identity), % data=warpbreaks) % LSmeans(warp.poi2, effect="tension", type="link") % @ %def

%% SANITY CHECK %% @ %% <<>>= %% K <- LE_matrix(warp.poi2, effect="tension") %% doBy::esticon(warp.poi2, K) %% @ %def

% <<>>= % warp.gam <- glm(breaks ~ wool + tension, family=Gamma(link=identity), % data=warpbreaks) % LSmeans(warp.gam, effect="tension", type="link") % @ %def

%% SANITY CHECK %% @ %% <<>>= %% K <- LE_matrix(warp.gam, effect="tension") %% doBy::esticon(warp.gam, K) %% @ %def

% Notice that the linear estimates are practically the same as for the % linear model, but the standard errors are smaller and hence the % confidence intervals are narrower.

% An alternative is to fit a quasi Poisson ``model''

% <<>>= % warp.poi3 <- glm(breaks ~ wool + tension, family=quasipoisson(link=identity), % data=warpbreaks) % LSmeans(warp.poi3, effect="tension") % @ %def



hojsgaard/doBy documentation built on May 4, 2024, 5:20 a.m.