nauf-pmmeans: Predicted marginal means for 'nauf' models.
In CDEager/nauf: Regression with NA Values in Unordered Factors

Description Usage Arguments Details Value See Also

Create a reference grid for a nauf model with nauf_ref.grid, and use the resultnig nauf.ref.grid as the object argument to nauf_pmmeans to obtain predicted marginal means and pairwise comparisons, optionally conditioning these predictions on certain subsets of the data via the subset argument.

nauf_ref.grid(mod, KR = FALSE, ...)

nauf_pmmeans(object, specs, pairwise = FALSE, subset = NULL,
  na_as_level = NULL, by = NULL, ...)

`mod`	A regression model fit with `nauf` contrasts.
`KR`	Only applies when `mod` is a `nauf.lmerMod` fit with `REML = TRUE`. If `KR = TRUE`, then the Kenward-Roger approximation is used to calculate degrees of freedom. If `KR = FALSE` (the default), then the Satterthwaite approximation is used. When `mod` is a `nauf.lmerMod` fit with `REML = FALSE`, then the Satterthwaite approximation is always used. The Kenward-Roger method is implemented with `Lb_ddf` and the Satterthwaite method is implemented with `calcSatterth`.
`...`	Additional arguments are ignored with a warning.
`object`	A `nauf.ref.grid` object created with `nauf_ref.grid`.
`specs`	The fixed effects for which the full interaction term should be considered in the calculation of predicted marginal means. The preferred method is to specify the variables as a character vector. However, they can also be specified on the right hand side of a formula, optionally with the keyword `pairwise` on the left hand side to indicate that pairwise comparions should be performed.
`pairwise`	A logical (default `FALSE`) indicating whether pairwise comparisons of the predicted marginal means should be performed. If `specs` is a formula, then the `pairwise` argument is ignored and the left hand side of the formula is used to determined whether pairwise comparsions should be made. If `by` is not `NULL` then `pairwise` is forced to `TRUE`.
`subset`	A list indicating which subsets of the reference grid should be considered in the calculation of the predicted marginal means. See 'Details'.
`na_as_level`	A character vector of unordered factors in `specs` that have `NA` values that should be considered as levels. The default `NULL` indicates that `NA` should not be considered as a level for any unordered factors in `specs`. See 'Details'.
`by`	An optional character vector specifying unordered factors in `specs`. If specified, then pairwise comparisons are performed within each level of the full interaction term of the factors, rather than for all possible combinations. If an unordered factor listed in `by` is not included in `specs`, it is added to `specs` automatically.

A reference grid creates a data frame which contains all possible combinations of the factors in a regression model, holding all covariates at their mean values. There are many options for ref.grid which are not currently supported for nauf models. The main functionality which is not currently supported is that the reference grid cannot be created specifying certain levels for variables (i.e. the at argument; this is handled through the subset argument to nauf_pmmeans). A direct call to ref.grid will result in warnings (or possibly errors), and inference made with the resulting object will be misleading and/or incorrect. Only nauf_ref.grid should be used. The nauf.ref.grid returned by nauf_ref.grid can then be used as the object argument to nauf_pmmeans to obtain predicted marginal means and pairwise comparisons with p-values that adjust for familywise error rate.

The specs and pairwise arguments to nauf_pmmeans indicate what variables marginal means should be calculated for and wheter pairwise comparisons of these means should be made. If specs is a character vector, then pairwise is used; if specs is a formula, then the full iteraction of the terms on the right hand side of the formula is considered, and the left hand side is used to indicate pairwise comparisons. For example (where rg is a nauf.ref.grid):

# all of these calculate pmm's for each combination of the factors f1 and f2
# but not pairwise comparisons
nauf_pmmeans(rg, c("f1", "f2"))
nauf_pmmeans(rg, ~ f1 + f2)
nauf_pmmeans(rg, ~ f1 * f2)
nauf_pmmeans(rg, ~ f1:f2)

# all of these calculate the same pmm's, and additionally pairwise comparions
nauf_pmmeans(rg, c("f1", "f2"), pairwise = TRUE)
nauf_pmmeans(rg, pairwise ~ f1 + f2)
nauf_pmmeans(rg, pairwise ~ f1 * f2)
nauf_pmmeans(rg, pairwise ~ f1:f2)

If specs indicates a single covariate, the effect of an increase of 1 in the covariate is computed. If specs indicates multiple covariates, the effect of a simultaneous increase of 1 in all of the covariates is computed. If specs indicates a combination of factors and covariate(s), the the effect of an increase of 1 for the covariates is calcualted for each level of the full interaction of the factors.

If by is specified, then pairwise is forced to TRUE, and pairwise comparisons are performed within each level of the full interaction of the factors listed in by, rather than performing all possible pairwise comparisons. For example, if there are two factors f1 with levels A, B, and C, and a factor f2 with levels D and E:

# this will produce six pmmeans (A:D, A:E, B:D, B:E, C:D, C:E) and
# all 15 pairwise comparisons
nauf_pmmeans(rg, c("f1", "f2"), pairwise = TRUE)

# this would produce the same six pmmeans, but only three pairwise
# comparisons (A:D - A:E, B:D - B:E, C:D - C:E)
nauf_pmmeans(rg, c("f1", "f2"), by = "f1")

# this would produce the same six pmmeans, but only six pairwise comparisons
# (A:D - B:D, A:D - C:D, B:D - C:D, A:E - B:E, A:E - C:E, B:E - C:E)
nauf_pmmeans(rg, c("f1", "f2"), by = "f2")

The reference grid returned by nauf_ref.grid contains combinations of factors which are not actually possible in the data set. For example, if factor f1 has levels A and B, and factor f2 is NA when f1 = A, and takes values C and D when f1 = B, the reference grid will still contain the combinations f1 = A, f2 = C; f1 = A, f2 = D; and f1 = B, f2 = NA, even though these combinations are not possible. This is because it is impossible to know without the user's knowledge which combinations make sense. In many cases, this is inconsequential for the computation of predicted marginal means, since the coding of unordered factors in nauf regressions will average over the effects. In cases where these rows in the reference grid will cause invalid estimates and pairwise comparisons, the subset argument can be used in the call to nauf_pmmeans to ensure only the correct subsets are considered. The default for the subset argument is NULL, indicating that the the entire reference grid should be considered. If not NULL, then subset must be a list which defines the valid subsets as lists of named character vectors, where the name of the character vector is an unordered factor in the model, and the vector itself contains the levels which define the subset (including NA in the case of factors which have NA values; when NA is specified as a level, there should be no quotes around it). Any row in the reference grid which matches the definition of at least one of the groups defined in subset is kept, and all others are dropped. So, continuing with the f1 and f2 example, if f2 = NA corresponds to f2 = D in meaning, and is coded as NA because all f1 = A observations are by necessity f2 = D, then to analyze the effect of f1, we want to compare the groups f1 = A, f2 = NA and f1 = B, f2 = D, which we could do with the following call:

1 2	nauf_pmmeans(rg, "f1", subset = list( list(f1 = "A", f2 = NA), list(f1 = "B", f2 = "D")))

This would produce an estimate for f1 = A and f1 = B, but conditioning on the subset where f1 is truly contrastive based on f2. If, on the other hand, f2 = NA does not correspond in interpretation to either f2 = C or f2 = D, but rather indicates that f2 is simply not meaningful when f1 = A, we would want to average over the effect of f2 within f1 = B, and compare this result to f1 = A, f2 = NA, which we could do with the following call:

1 2	nauf_pmmeans(rg, "f1", subset = list( list(f1 = "A", f2 = NA), list(f1 = "B", f2 = c("C", "D"))))

In this case, the second sub-list in the subset list indicates that if f1 = B and either f2 = C or f2 = D, then it belongs to the second subset. In this case, the subset argument is actually not necessary, since for f1 = A, we want to not consider the effect f2, and for f2 = B, we want to average over all possible levels of f2, and these are actually the same thing computationally for unordered factors in nauf models. That is, we would get the same result with:

1	nauf_pmmeans(rg, "f1")

Generally speaking, if all of the factors in specs do not contain NA values, then the subset argument is unnecessary. If any of the factors in specs do contain NA values, then you will almost always want to use the subset argument. Now consider that we are interested in f2. Because f2 is only contrastive when f1 = B, we probably want to call:

# note that because there are not multiple subsets being specified, you
# don't have to specify subset = list(list(f1 = "B")); nauf_pmmeans will
# assume list(f1 = "B") means list(list(f1 = "B"))
nauf_pmmeans(rg, "f2", subset = list(f1 = "B"))

This call will produce two estimates, one for f2 = C and one for f2 = D, conditioning on f1 = B. There will be no estimate for f2 = NA because, by default, no estiamtes are produced for combinations of factors where one factor is NA. If we wanted to compare the three possible groups (i.e. f1 = A, f2 = NA; f1 = B, f2 = C; and f1 = B, f2 = D), then we could additionally use the na_as_level argument and change our subset:

nauf_pmmeans(rg, "f2", subset = list(
  list(f1 = "A", f2 = NA), list(f1 = "B", f2 = c("C", "D"))),
  na_as_level = "f2")

# this gives the same estimates, but the output will also show the
# corresponding level of f1, which is more transparent
nauf_pmmeans(rg, c("f1", "f2"), subset = list(
  list(f1 = "A", f2 = NA), list(f1 = "B", f2 = c("C", "D"))),
  na_as_level = "f2")

The easiest way to use the subset argument is to create a list that defines valid subsets for different regression terms of interest outside of nauf_pmmeans, and then using the relevant element of the list in the nauf_pmmeans call. For example:

pmmsubs <- list()
pmmsubs$f1 <- list(list(f1 = "A", f2 = NA), list(f1 = "B", f2 = c("C", "D")))
pmmsubs$f2 <- list(f1 = "B")

nauf_pmmeans(rg, "f1", subset = pmmsubs$f1)

nauf_pmmeans(rg, "f2", subset = pmmsubs$f2)

nauf_pmmeans(rg, c("f1", "f2"), subset = pmmsubs$f1, na_as_level = "f2")

This way you can just define the different subsets of the data once and not have to think about it at every nauf_pmmeans call.

nauf_ref.grid returns a nauf.ref.grid object, which is just a list with one element ref.grid of class ref.grid-class. This reference grid should not be used directly with lsmeans, but rather only with nauf_pmmeans. nauf_pmmeans returns a nauf.pmm.list object.

nauf_contrasts, nauf_glm, nauf_glmer, nauf_stan_glm, and nauf_stan_glmer.