grc: Row-Column Interaction Models including Goodman's RC...
In VGAM: Vector Generalized Linear and Additive Models

View source: R/family.rrr.R

grc	R Documentation

Row-Column Interaction Models including Goodman's RC Association Model and Unconstrained Quadratic Ordination

Description

Fits a Goodman's RC association model (GRC) to a matrix of counts, and more generally, row-column interaction models (RCIMs). RCIMs allow for unconstrained quadratic ordination (UQO).

Usage

grc(y, Rank = 1, Index.corner = 2:(1 + Rank),
    str0 = 1, summary.arg = FALSE, h.step = 1e-04, ...)
rcim(y, family = poissonff, Rank = 0, M1 = NULL,
     weights = NULL, which.linpred = 1,
     Index.corner = ifelse(is.null(str0), 0, max(str0)) + 1:Rank,
     rprefix = "Row.", cprefix = "Col.", iprefix = "X2.",
     offset = 0, str0 = if (Rank) 1 else NULL,
     summary.arg = FALSE, h.step = 0.0001,
     rbaseline = 1, cbaseline = 1,
     has.intercept = TRUE, M = NULL, rindex = 2:nrow(y),
     cindex = 2:ncol(y), iindex = 2:nrow(y), ...)

Arguments

`y`	For `grc()`: a matrix of counts. For `rcim()`: a general matrix response depending on `family`. Output from `table()` is acceptable; it is converted into a matrix. Note that `y` should be at least 3 by 3 in dimension.
`family`	A VGAM family function. By default, the first linear/additive predictor is fitted using main effects plus an optional rank-`Rank` interaction term. Not all family functions are suitable or make sense. All other linear/additive predictors are fitted using an intercept-only, so it has a common value over all rows and columns. For example, `zipoissonff` may be suitable for counts but not `zipoisson` because of the ordering of the linear/additive predictors. If the VGAM family function does not have an `infos` slot then `M1` needs to be inputted (the number of linear predictors for an ordinary (usually univariate) response, aka `M`). The VGAM family function also needs to be able to handle multiple responses (currently not all of them can do this).
`Rank`	An integer from the set {0,...,`min(nrow(y), ncol(y))`}. This is the dimension of the fit in terms of the interaction. For `grc()` this argument must be positive. A value of 0 means no interactions (i.e., main effects only); each row and column is represented by an indicator variable.
`weights`	Prior weights. Fed into `rrvglm` or `vglm`.
`which.linpred`	Single integer. Specifies which linear predictor is modelled as the sum of an intercept, row effect, column effect plus an optional interaction term. It should be one value from the set `1:M1`.
`Index.corner`	A vector of `Rank` integers. These are used to store the `Rank` by `Rank` identity matrix in the `A` matrix; corner constraints are used.
`rprefix`, `cprefix`, `iprefix`	Character, for rows and columns and interactions respectively. For labelling the indicator variables.
`offset`	Numeric. Either a matrix of the right dimension, else a single numeric expanded into such a matrix.
`str0`	Ignored if `Rank = 0`, else an integer from the set {1,...,`min(nrow(y), ncol(y))`}, specifying the row that is used as the structural zero. Passed into `rrvglm.control` if `Rank > 0`. Set `str0 = NULL` for none.
`summary.arg`	Logical. If `TRUE` then a summary is returned. If `TRUE` then `y` may be the output (fitted object) of `grc()`.
`h.step`	A small positive value that is passed into `summary.rrvglm()`. Only used when `summary.arg = TRUE`.
`...`	Arguments that are passed into `rrvglm.control()`.
`M1`	The number of linear predictors of the VGAM `family` function for an ordinary (univariate) response. Then the number of linear predictors of the `rcim()` fit is usually the number of columns of `y` multiplied by `M1`. The default is to evaluate the `infos` slot of the VGAM `family` function to try to evaluate it; see `vglmff-class`. If this information is not yet supplied by the family function then the value needs to be inputted manually using this argument.
`rbaseline`, `cbaseline`	Baseline reference levels for the rows and columns. Currently stored on the object but not used.
`has.intercept`	Logical. Include an intercept?
`M`, `cindex`	`M` is the usual VGAM `M`, viz. the number of linear/additive predictors in total. Also, `cindex` means column index, and these point to the columns of `y` which are part of the vector of linear/additive predictor main effects. For `family = multinomial` it is necessary to input these arguments as `M = ncol(y)-1` and `cindex = 2:(ncol(y)-1)`.
`rindex`, `iindex`	`rindex` means row index, and these are similar to `cindex`. `iindex` means interaction index, and these are similar to `cindex`.

Details

Goodman's RC association model fits a reduced-rank approximation to a table of counts. A Poisson model is assumed. The log of each cell mean is decomposed as an intercept plus a row effect plus a column effect plus a reduced-rank component. The latter can be collectively written A %*% t(C), the product of two ‘thin’ matrices. Indeed, A and C have Rank columns. By default, the first column and row of the interaction matrix A %*% t(C) is chosen to be structural zeros, because str0 = 1. This means the first row of A are all zeros.

This function uses options()$contrasts to set up the row and column indicator variables. In particular, Equation (4.5) of Yee and Hastie (2003) is used. These are called Row. and Col. (by default) followed by the row or column number.

The function rcim() is more general than grc(). Its default is a no-interaction model of grc(), i.e., rank-0 and a Poisson distribution. This means that each row and column has a dummy variable associated with it. The first row and first column are baseline. The power of rcim() is that many VGAM family functions can be assigned to its family argument. For example, uninormal fits something in between a 2-way ANOVA with and without interactions, alaplace2 with Rank = 0 is something like medpolish. Others include zipoissonff and negbinomial. Hopefully one day all VGAM family functions will work when assigned to the family argument, although the result may not have meaning.

Unconstrained quadratic ordination (UQO) can be performed using rcim() and grc(). This has been called unconstrained Gaussian ordination in the literature, however the word Gaussian has two meanings which is confusing; it is better to use quadratic because the bell-shape response surface is meant. UQO is similar to CQO (cqo) except there are no environmental/explanatory variables. Here, a GLM is fitted to each column (species) that is a quadratic function of hypothetical latent variables or gradients. Thus each row of the response has an associated site score, and each column of the response has an associated optimum and tolerance matrix. UQO can be performed here under the assumption that all species have the same tolerance matrices. See Yee and Hadi (2014) for details. It is not recommended that presence/absence data be inputted because the information content is so low for each site-species cell. The example below uses Poisson counts.

Value

An object of class "grc", which currently is the same as an "rrvglm" object. Currently, a rank-0 rcim() object is of class rcim0-class, else of class "rcim" (this may change in the future).

Warning

The function rcim() is experimental at this stage and may have bugs. Quite a lot of expertise is needed when fitting and in its interpretion thereof. For example, the constraint matrices applies the reduced-rank regression to the first (see which.linpred) linear predictor and the other linear predictors are intercept-only and have a common value throughout the entire data set. This means that, by default, family = zipoissonff is appropriate but not family = zipoisson. Else set family = zipoisson and which.linpred = 2. To understand what is going on, do examine the constraint matrices of the fitted object, and reconcile this with Equations (4.3) to (4.5) of Yee and Hastie (2003).

The functions temporarily create a permanent data frame called .grc.df or .rcim.df, which used to be needed by summary.rrvglm(). Then these data frames are deleted before exiting the function. If an error occurs then the data frames may be present in the workspace.

Note

These functions set up the indicator variables etc. before calling rrvglm or vglm. The ... is passed into rrvglm.control or vglm.control, This means, e.g., Rank = 1 is default for grc().

The data should be labelled with rownames and colnames. Setting trace = TRUE is recommended to monitor convergence. Using criterion = "coefficients" can result in slow convergence.

If summary = TRUE then y can be a "grc" object, in which case a summary can be returned. That is, grc(y, summary = TRUE) is equivalent to summary(grc(y)). It is not possible to plot a grc(y, summary = TRUE) or rcim(y, summary = TRUE) object.

Author(s)

Thomas W. Yee, with assistance from Alfian F. Hadi.

References

Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.

Yee, T. W. and Hadi, A. F. (2014). Row-column interaction models, with an R implementation. Computational Statistics, 29, 1427–1445.

Goodman, L. A. (1981). Association models and canonical correlation in the analysis of cross-classifications having ordered categories. Journal of the American Statistical Association, 76, 320–334.

Examples

## Not run: 
# Example 1: Undergraduate enrolments at Auckland University in 1990
fitted(grc1 <- grc(auuc))
summary(grc1)

grc2 <- grc(auuc, Rank = 2, Index.corner = c(2, 5))
fitted(grc2)
summary(grc2)

model3 <- rcim(auuc, Rank = 1, fam = multinomial,
               M = ncol(auuc)-1, cindex = 2:(ncol(auuc)-1), trace = TRUE)
fitted(model3)
summary(model3)

# Median polish but not 100 percent reliable. Maybe call alaplace2()...
rcim0 <- rcim(auuc, fam = alaplace1(tau = 0.5), trace=FALSE, maxit = 500)
round(fitted(rcim0), digits = 0)
round(100 * (fitted(rcim0) - auuc) / auuc, digits = 0)  # Discrepancy
depvar(rcim0)
round(coef(rcim0, matrix = TRUE), digits = 2)
Coef(rcim0, matrix = TRUE)
# constraints(rcim0)
names(constraints(rcim0))

# Compare with medpolish():
(med.a <- medpolish(auuc))
fv <- med.a$overall + outer(med.a$row, med.a$col, "+")
round(100 * (fitted(rcim0) - fv) / fv)  # Hopefully should be all 0s


# Example 2: 2012 Summer Olympic Games in London
top10 <- head(olym12, 10)
grc1.oly12 <- with(top10, grc(cbind(gold, silver, bronze)))
round(fitted(grc1.oly12))
round(resid(grc1.oly12, type = "response"), digits = 1)  # Resp. resids
summary(grc1.oly12)
Coef(grc1.oly12)


# Example 3: UQO; see Yee and Hadi (2014)
n <- 100; p <- 5; S <- 10
pdata <- rcqo(n, p, S, es.opt = FALSE, eq.max = FALSE,
              eq.tol = TRUE, sd.latvar = 0.75)  # Poisson counts
true.nu <- attr(pdata, "latvar")  # The 'truth'; site scores
attr(pdata, "tolerances")  # The 'truth'; tolerances

Y <- Select(pdata, "y", sort = FALSE)  # Y matrix (n x S); the "y" vars
uqo.rcim1 <- rcim(Y, Rank = 1,
                  str0 = NULL,  # Delta covers entire n x M matrix
                  iindex = 1:nrow(Y),  # RRR covers the entire Y
                  has.intercept = FALSE)  # Suppress the intercept

# Plot 1
par(mfrow = c(2, 2))
plot(attr(pdata, "optimums"), Coef(uqo.rcim1)@A,
     col = "blue", type = "p", main = "(a) UQO optimums",
     xlab = "True optimums", ylab = "Estimated (UQO) optimums")
mylm <- lm(Coef(uqo.rcim1)@A ~ attr(pdata, "optimums"))
abline(coef = coef(mylm), col = "orange", lty = "dashed")

# Plot 2
fill.val <- NULL  # Choose this for the new parameterization
plot(attr(pdata, "latvar"), c(fill.val, concoef(uqo.rcim1)),
     las = 1, col = "blue", type = "p", main = "(b) UQO site scores",
     xlab = "True site scores", ylab = "Estimated (UQO) site scores" )
mylm <- lm(c(fill.val, concoef(uqo.rcim1)) ~ attr(pdata, "latvar"))
abline(coef = coef(mylm), col = "orange", lty = "dashed")

# Plots 3 and 4
myform <- attr(pdata, "formula")
p1ut <- cqo(myform, family = poissonff,
            eq.tol = FALSE, trace = FALSE, data = pdata)
c1ut <- cqo(Select(pdata, "y", sort = FALSE) ~ scale(latvar(uqo.rcim1)),
        family = poissonff, eq.tol = FALSE, trace = FALSE, data = pdata)
lvplot(p1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5,
       main = "(c) CQO fitted to the original data",
       xlab = "Estimated (CQO) site scores")
lvplot(c1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5,
       main = "(d) CQO fitted to the scaled UQO site scores",
       xlab = "Estimated (UQO) site scores")

## End(Not run)

VGAM documentation built on Dec. 4, 2025, 1:07 a.m.