Fits a Goodman's RC association model (GRC) to a matrix of counts, and more generally, row-column interaction models (RCIMs). RCIMs allow for unconstrained quadratic ordination (UQO).
1 2 3 4 5 6 7 8 9 10 11
grc(y, Rank = 1, Index.corner = 2:(1 + Rank), str0 = 1, summary.arg = FALSE, h.step = 1e-04, ...) rcim(y, family = poissonff, Rank = 0, M1 = NULL, weights = NULL, which.linpred = 1, Index.corner = ifelse(is.null(str0), 0, max(str0)) + 1:Rank, rprefix = "Row.", cprefix = "Col.", iprefix = "X2.", offset = 0, str0 = if (Rank) 1 else NULL, summary.arg = FALSE, h.step = 0.0001, rbaseline = 1, cbaseline = 1, has.intercept = TRUE, M = NULL, rindex = 2:nrow(y), cindex = 2:ncol(y), iindex = 2:nrow(y), ...)
A VGAM family function.
By default, the first linear/additive predictor
using main effects plus an optional rank-
An integer from the set
Prior weights. Fed into
Specifies which linear predictor is modelled as the sum of an
intercept, row effect, column effect plus an optional interaction
term. It should be one value from the set
A vector of
Character, for rows and columns and interactions respectively. For labelling the indicator variables.
Numeric. Either a matrix of the right dimension, else a single numeric expanded into such a matrix.
A small positive value that is passed into
Arguments that are passed
The number of linear predictors of the VGAM
Baseline reference levels for the rows and columns. Currently stored on the object but not used.
Logical. Include an intercept?
M is the usual VGAM M,
viz. the number of linear/additive
predictors in total.
Goodman's RC association model fits a reduced-rank approximation
to a table of counts.
A Poisson model is assumed.
The log of each cell mean is decomposed as an
intercept plus a row effect plus a column effect plus a reduced-rank
component. The latter can be collectively written
A %*% t(C),
the product of two ‘thin’ matrices.
By default, the first column and row of the interaction matrix
A %*% t(C) is chosen
to be structural zeros, because
str0 = 1.
This means the first row of
A are all zeros.
This function uses
options()$contrasts to set up the row and
column indicator variables.
In particular, Equation (4.5) of Yee and Hastie (2003) is used.
These are called
Col. (by default) followed
by the row or column number.
rcim() is more general than
Its default is a no-interaction model of
rank-0 and a Poisson distribution. This means that each
row and column has a dummy variable associated with it.
The first row and first column are baseline.
The power of
rcim() is that many VGAM family functions
can be assigned to its
uninormal fits something in between a 2-way
ANOVA with and without interactions,
Rank = 0 is something like
Hopefully one day all VGAM family functions will
work when assigned to the
family argument, although the
result may not have meaning.
Unconstrained quadratic ordination (UQO) can be performed
This has been called unconstrained Gaussian ordination
in the literature, however the word Gaussian has two
meanings which is confusing; it is better to use
quadratic because the bell-shape response surface is meant.
UQO is similar to CQO (
cqo) except there are
no environmental/explanatory variables.
Here, a GLM is fitted to each column (species)
that is a quadratic function of hypothetical latent variables
Thus each row of the response has an associated site score,
and each column of the response has an associated optimum
and tolerance matrix.
UQO can be performed here under the assumption that all species
have the same tolerance matrices.
See Yee and Hadi (2014) for details.
It is not recommended that presence/absence data be inputted
because the information content is so low for each site-species
The example below uses Poisson counts.
An object of class
"grc", which currently is the same as
rcim() object is of class
else of class
"rcim" (this may change in the future).
rcim() is experimental at this stage and
may have bugs.
Quite a lot of expertise is needed when fitting and in its
interpretion thereof. For example, the constraint
matrices applies the reduced-rank regression to the first
linear predictor and the other linear predictors are intercept-only
and have a common value throughout the entire data set.
This means that, by default,
appropriate but not
which.linpred = 2.
To understand what is going on, do examine the constraint
matrices of the fitted object, and reconcile this with
Equations (4.3) to (4.5) of Yee and Hastie (2003).
The functions temporarily create a permanent data frame
.rcim.df, which used
to be needed by
summary.rrvglm(). Then these
data frames are deleted before exiting the function.
If an error occurs then the data frames may be present
in the workspace.
These functions set up the indicator variables etc. before calling
... is passed into
This means, e.g.,
Rank = 1 is default for
The data should be labelled with
trace = TRUE is recommended to monitor
criterion = "coefficients" can result in slow convergence.
summary = TRUE then
y can be a
"grc" object, in which case a summary can be returned.
grc(y, summary = TRUE) is
It is not possible to plot a
grc(y, summary = TRUE) or
rcim(y, summary = TRUE) object.
Thomas W. Yee, with assistance from Alfian F. Hadi.
Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. and Hadi, A. F. (2014). Row-column interaction models, with an R implementation. Computational Statistics, 29, 1427–1445.
Goodman, L. A. (1981). Association models and canonical correlation in the analysis of cross-classifications having ordered categories. Journal of the American Statistical Association, 76, 320–334.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
# Example 1: Undergraduate enrolments at Auckland University in 1990 fitted(grc1 <- grc(auuc)) summary(grc1) grc2 <- grc(auuc, Rank = 2, Index.corner = c(2, 5)) fitted(grc2) summary(grc2) model3 <- rcim(auuc, Rank = 1, fam = multinomial, M = ncol(auuc)-1, cindex = 2:(ncol(auuc)-1), trace = TRUE) fitted(model3) summary(model3) # Median polish but not 100 percent reliable. Maybe call alaplace2()... ## Not run: rcim0 <- rcim(auuc, fam = alaplace1(tau = 0.5), trace=FALSE, maxit = 500) round(fitted(rcim0), digits = 0) round(100 * (fitted(rcim0) - auuc) / auuc, digits = 0) # Discrepancy depvar(rcim0) round(coef(rcim0, matrix = TRUE), digits = 2) Coef(rcim0, matrix = TRUE) # constraints(rcim0) names(constraints(rcim0)) # Compare with medpolish(): (med.a <- medpolish(auuc)) fv <- med.a$overall + outer(med.a$row, med.a$col, "+") round(100 * (fitted(rcim0) - fv) / fv) # Hopefully should be all 0s ## End(Not run) # Example 2: 2012 Summer Olympic Games in London ## Not run: top10 <- head(olym12, 10) grc1.oly12 <- with(top10, grc(cbind(gold, silver, bronze))) round(fitted(grc1.oly12)) round(resid(grc1.oly12, type = "response"), digits = 1) # Resp. resids summary(grc1.oly12) Coef(grc1.oly12) ## End(Not run) # Example 3: UQO; see Yee and Hadi (2014) ## Not run: n <- 100; p <- 5; S <- 10 pdata <- rcqo(n, p, S, es.opt = FALSE, eq.max = FALSE, eq.tol = TRUE, sd.latvar = 0.75) # Poisson counts true.nu <- attr(pdata, "latvar") # The 'truth'; site scores attr(pdata, "tolerances") # The 'truth'; tolerances Y <- Select(pdata, "y", sort = FALSE) # Y matrix (n x S); the "y" vars uqo.rcim1 <- rcim(Y, Rank = 1, str0 = NULL, # Delta covers entire n x M matrix iindex = 1:nrow(Y), # RRR covers the entire Y has.intercept = FALSE) # Suppress the intercept # Plot 1 par(mfrow = c(2, 2)) plot(attr(pdata, "optimums"), Coef(uqo.rcim1)@A, col = "blue", type = "p", main = "(a) UQO optimums", xlab = "True optimums", ylab = "Estimated (UQO) optimums") mylm <- lm(Coef(uqo.rcim1)@A ~ attr(pdata, "optimums")) abline(coef = coef(mylm), col = "orange", lty = "dashed") # Plot 2 fill.val <- NULL # Choose this for the new parameterization plot(attr(pdata, "latvar"), c(fill.val, concoef(uqo.rcim1)), las = 1, col = "blue", type = "p", main = "(b) UQO site scores", xlab = "True site scores", ylab = "Estimated (UQO) site scores" ) mylm <- lm(c(fill.val, concoef(uqo.rcim1)) ~ attr(pdata, "latvar")) abline(coef = coef(mylm), col = "orange", lty = "dashed") # Plots 3 and 4 myform <- attr(pdata, "formula") p1ut <- cqo(myform, family = poissonff, eq.tol = FALSE, trace = FALSE, data = pdata) c1ut <- cqo(Select(pdata, "y", sort = FALSE) ~ scale(latvar(uqo.rcim1)), family = poissonff, eq.tol = FALSE, trace = FALSE, data = pdata) lvplot(p1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5, main = "(c) CQO fitted to the original data", xlab = "Estimated (CQO) site scores") lvplot(c1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5, main = "(d) CQO fitted to the scaled UQO site scores", xlab = "Estimated (UQO) site scores") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.