Basics of estimated marginal means"
In emmeans: Estimated Marginal Means, aka Least-Squares Means

require("emmeans")
require("ggplot2")
knitr::opts_chunk$set(fig.width = 4.5, class.output = "ro")

Foundations a. Emphasis on experimental data b. Emphasis on models c. Illustration: pigs experiment d. Estimated marginal means e. The reference grid, and definition of EMMs f. More on the reference grid
Other topics a. Passing arguments b. Transformations c. Derived covariates d. Non-predictor variables e. Graphical displays f. Formatting results g. Using weights h. Multivariate responses
Objects, structures, and methods
P values, "significance", and recommendations
Summary
Further reading

Index of all vignette topics

Foundations {#found}

Emphasis on experimental data {#exper}

To start off with, we should emphasize that the underpinnings of estimated marginal means -- and much of what the emmeans package offers -- relate more to experimental data than to observational data. In observational data, we sample from some population, and the goal of statistical analysis is to characterize that population in some way. In contrast, with experimental data, the experimenter controls the environment under which test runs are conducted, and in which responses are observed and recorded. Thus with experimentation, the population is an abstract entity consisting of potential outcomes of test runs made under conditions we enforce, rather than a physical entity that we observe without changing it.

We say this because the default behavior of the emmeans() function is to average groups together with equal weights; this is common in analysis of experiments, but not common in analysis of observational data; and I think that misunderstandings about this underlie some criticisms such as are found here and here.

Consider, for example, a classic Latin square experimental design. RA Fisher and others expounded on such designs. Suppose we want to compare four treatments, say fertilizers, in an agricultural experiment. A Latin square plan would involve dividing a parcel of land into four rows and four columns, defining 16 plots. Then we apply one of the fertilizers to each plot in such a way that each fertilizer appears once in each row and once in each column (and thus, each row and each column contains all four fertilizers). This scheme, to some extent, controls for possible spatial effects within the land parcel. To compare the fertilizer, we average together the response values (say, yield of a crop) observed on the four plots where each fertilizer was used. It seems right to average these together with equal weight, because each experimental condition seems equally valid and there is no reason to give one more weight than another. In this illustration, the fertilizer means are not marginal means of some physical population; they are simply the means obtained under the four test conditions defined by the experiment.

Emphasis on models {#models}

The emmeans package requires you to fit a model to your data. All the results obtained in emmeans rely on this model. So, really, the analysis obtained is really an analysis of the model, not the data. This analysis does depend on the data, but only insofar as the fitted model depends on the data. We use predictions from this model to compute estimated marginal means (EMMs), which will be defined more explicitly below. For now, there are two things to know:

If you change the model, that changes the EMMs
If the model fits poorly, the EMMs represent the data poorly (the garbage in, garbage out principle)

So to use this package to analyze your data, the most important first step is to fit a good model.

emmeans Estimated Marginal Means, aka Least-Squares Means

Basics of estimated marginal means" In emmeans: Estimated Marginal Means, aka Least-Squares Means

Contents

Foundations {#found}

Emphasis on experimental data {#exper}

Emphasis on models {#models}

Illustration: pigs experiment {#pigs}

Estimated marginal means {#emms}

The reference grid, and definition of EMMs {#refgrid}

More on the reference grid {#RG}

{#emmip}

Other topics {#othertopics}

Passing arguments {#arguments}

Transformations {#transf}

Derived covariates {#depcovs}

Non-predictor variables {#params}

Graphical displays {#plots}

{#plot.emmGrid}

{#ggplot}

Formatting results {#formatting}

Using weights {#weights}

Multivariate responses {#multiv}

Objects, structures, and methods {#emmobj}

P values, "significance", and recommendations {#pvalues}

A set of comparisons or well-chosen contrasts is more useful and interpretable than an omnibus F test {#recs1}

Use adjusted P values

It is not necessary to have a significant F test as a prerequisite to doing comparisons or contrasts {#recs2}

Get the model right first

Consider seeking the advice of a statistical consultant {#recs3}

Summary of main points {#summary}

References

Further reading {#more}

Try the emmeans package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

emmeans
Estimated Marginal Means, aka Least-Squares Means

Basics of estimated marginal means"
In emmeans: Estimated Marginal Means, aka Least-Squares Means

Illustration: `pigs` experiment {#pigs}