procD.lm | R Documentation |
Function performs Procrustes ANOVA with permutation procedures to assess statistical hypotheses describing patterns of shape variation and covariation for a set of Procrustes shape variables
procD.lm(
f1,
iter = 999,
seed = NULL,
RRPP = TRUE,
SS.type = c("I", "II", "III"),
effect.type = c("F", "cohenf", "SS", "MS", "Rsq"),
int.first = FALSE,
Cov = NULL,
turbo = TRUE,
Parallel = FALSE,
data = NULL,
print.progress = FALSE,
...
)
f1 |
A formula for the linear model (e.g., y~x1+x2) |
iter |
Number of iterations for significance testing |
seed |
An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users. |
RRPP |
A logical value indicating whether residual randomization should be used for significance testing |
SS.type |
SS.type A choice between type I (sequential), type II (hierarchical), or type III (marginal) sums of squares and cross-products computations. |
effect.type |
One of "F", "SS", or "cohen", to choose from which random distribution to estimate effect size. (The option, "cohen", is for Cohen's f-squared values. The default is "F". Values are log-transformed before z-score calculation to assure normally distributed data.) |
int.first |
A logical value to indicate if interactions of first main effects should precede subsequent main effects |
Cov |
An optional covariance matrix that can be used for generalized least squares estimates of coefficients and sums of squares and cross-products (see Adams and Collyer 2018). |
turbo |
A logical value that if TRUE, suppresses coefficient estimation
in every random permutation. This will affect subsequent analyses that
require random coefficients (see |
Parallel |
Either a logical value to indicate whether parallel processing
should be used or a numeric value to indicate the number of cores to use in
parallel processing via the |
data |
A data frame for the function environment, see |
print.progress |
A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses. |
... |
Arguments not listed above that can passed on to |
The function quantifies the relative amount of shape variation attributable to one or more factors in a
linear model and estimates the probability of this variation ("significance") for a null model, via distributions generated
from resampling permutations. Data input is specified by a formula (e.g.,
y~X), where 'y' specifies the response variables (Procrustes shape variables), and 'X' contains one or more independent
variables (discrete or continuous). The response matrix 'y' can be either in the form of a two-dimensional data
matrix of dimension (n x [p x k]), or a 3D array (p x n x k). It is assumed that -if the data based
on landmark coordinates - the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA)
[e.g., with gpagen
].
The names specified for the independent (x) variables in the formula represent one or more
vectors containing continuous data or factors. It is assumed that the order of the specimens in the
shape matrix matches the order of values in the independent variables. Linear model fits (using the lm
function)
can also be input in place of a formula. Arguments for lm
can also be passed on via this function.
The function two.d.array
can be used to obtain a two-dimensional data matrix from a 3D array of landmark
coordinates; however this step is no longer necessary, as procD.lm can receive 3D arrays as dependent variables. It is also
recommended that geomorph.data.frame
is used to create and input a data frame. This will reduce problems caused
by conflicts between the global and function environments. In the absence of a specified data frame, procD.lm will attempt to
coerce input data into a data frame, but success is not guaranteed.
The function performs statistical assessment of the terms in the model using Procrustes distances among specimens, rather than explained covariance matrices among variables. With this approach, the sum-of-squared Procrustes distances are used as a measure of SS (see Goodall 1991). The observed SS are evaluated through permutation. In morphometrics this approach is known as a Procrustes ANOVA (Goodall 1991), which is equivalent to distance-based anova designs (Anderson 2001). Two possible resampling procedures are provided. First, if RRPP=FALSE, the rows of the matrix of shape variables are randomized relative to the design matrix. This is analogous to a 'full' randomization. Second, if RRPP=TRUE, a residual randomization permutation procedure is utilized (Collyer et al. 2015). Here, residual shape values from a reduced model are obtained, and are randomized with respect to the linear model under consideration. These are then added to predicted values from the remaining effects to obtain pseudo-values from which SS are calculated. NOTE: for single-factor designs, the two approaches are identical. However, when evaluating factorial models it has been shown that RRPP attains higher statistical power and thus has greater ability to identify patterns in data should they be present (see Anderson and terBraak 2003).
Effect-sizes (Z scores) are computed as standard deviates of either the SS, F, or Cohen's f-squared sampling distributions generated, which might be more intuitive for P-values than F-values (see Collyer et al. 2015). Values from these distributions are log-transformed prior to effect size estimation, to assure normally distributed data. The SS type will influence how Cohen's f-squared values are calculated. Cohen's f-squared values are based on partial eta-squared values that can be calculated sequentially or marginally, as with SS.
In the case that multiple factor or factor-covariate interactions are used in the model formula, one can specify whether all main effects should be added to the model first, or interactions should precede subsequent main effects (i.e., Y ~ a + b + c + a:b + ..., or Y ~ a + b + a:b + c + ..., respectively.)
The generic functions, print
, summary
, and plot
all work with procD.lm
.
The generic function, plot
has several options for plotting, using plot.procD.lm
. Diagnostics plots,
principal component plots (rotated to first PC of covariance matrix of fitted values), and regression plots can be performed.
One must provide a linear predictor, and
can choose among predicted values (PredLine) or regression scores (RegScore).
In these plotting options, the predictor does not need to be size, and fitted values and residuals from the procD.lm fit are used rather
than mean-centered values.
The procD.lm function is now a wrapper for the lm.rrpp
function
in the RRPP
package. Examples below illustrate how to utilize
RRPP
functions along with geomorph
functions for procD.lm objects,
increasing the breadth of possible downstream analyses.
An important update in version 3.1.0 is that advanced.procD.lm and nested.update have been deprecated.
The examples emphasize how pairwise comparisons can now be accomplished with pairwise
and
ANOVA updates for nested factors can be made with the anova.lm.rrpp
, utilizing the error argument.
These functions work on procD.lm objects that have already been created.
Compared to previous versions, GLS computations in random permutations are now possible in procD.lm. One should use RRPP = TRUE if a covariance matrix is provided as an argument. The method of SS calculations follows Adams and Collyer 2018. Additional output with a "gls." prefix is also available.
Compared to previous versions of geomorph, users might notice differences in effect sizes. Previous versions used z-scores calculated with expected values of statistics from null hypotheses (sensu Collyer et al. 2015); however Adams and Collyer (2016) showed that expected values for some statistics can vary with sample size and variable number, and recommended finding the expected value, empirically, as the mean from the set of random outcomes. Geomorph 3.0.4 and subsequent versions now center z-scores on their empirically estimated expected values and where appropriate, log-transform values to assure statistics are normally distributed. This can result in negative effect sizes, when statistics are smaller than expected compared to the average random outcome. For ANOVA-based functions, the option to choose among different statistics to measure effect size is now a function argument.
An object of class "procD.lm" is a list containing the following
aov.table |
An analysis of variance table; the same as the summary. |
call |
The matched call. |
coefficients |
A vector or matrix of linear model coefficients. |
Y |
The response data, in matrix form. |
X |
The model matrix. |
QR |
The QR decompositions of the model matrix. |
fitted |
The fitted values. |
residuals |
The residuals (observed responses - fitted responses). |
weights |
The weights used in weighted least-squares fitting. If no weights are used, NULL is returned. |
Terms |
The results of the |
term.labels |
The terms used in constructing the aov.table. |
data |
The data frame for the model. |
SS |
The sums of squares for each term, model residuals, and the total. |
SS.type |
The type of sums of squares. One of type I or type III. |
df |
The degrees of freedom for each SS. |
R2 |
The coefficient of determination for each model term. |
F |
The F values for each model term. |
permutations |
The number of random permutations (including observed) used. |
random.SS |
A matrix of random SS found via the resampling procedure used. |
random.F |
A matrix or vector of random F values found via the resampling procedure used. |
random.cohenf |
A matrix or vector of random Cohen's f-squared values found via the resampling procedure used. |
permutations |
The number of random permutations (including observed) used. |
effect.type |
The distribution used to estimate effect-size. |
perm.method |
A value indicating whether "Raw" values were shuffled or "RRPP" performed. |
gls |
This prefix will be used if a covariance matrix is provided to indicate GLS computations. |
Dean Adams and Michael Collyer
Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32-46.
Anderson MJ. and C.J.F. terBraak. 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73: 85-113.
Goodall, C.R. 1991. Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society B 53:285-339.
Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.
Adams, D.C. and M.L. Collyer. 2016. On the comparison of the strength of morphological integration across morphometric datasets. Evolution. 70:2623-2631.
Adams, D.C. and M.L. Collyer. 2018. Multivariate comparative methods: evaluations, comparisons, and recommendations. Systematic Biology. 67:14-31.
procD.pgls
and
lm.rrpp
for more on linear model fits with RRPP. See RRPP
for further
details on functions that can use procD.lm objects.
## Not run:
### ANOVA example for Goodall's F test (multivariate shape vs. factors)
data(plethodon)
Y.gpa <- gpagen(plethodon$land) #GPA-alignment
gdf <- geomorph.data.frame(Y.gpa,
site = plethodon$site,
species = plethodon$species) # geomorph data frame
fit1 <- procD.lm(coords ~ species * site,
data = gdf, iter = 999, turbo = TRUE,
RRPP = FALSE, print.progress = FALSE) # randomize raw values
fit2 <- procD.lm(coords ~ species * site,
data = gdf, iter = 999, turbo = TRUE,
RRPP = TRUE, print.progress = FALSE) # randomize residuals
summary(fit1)
summary(fit2)
# RRPP example applications
coef(fit2)
coef(fit2, test = TRUE)
anova(fit2) # same as summary
anova(fit2, effect.type = "Rsq")
# if species and site were modeled as random effects ...
anova(fit2, error = c("species:site", "species:site", "Residuals"))
# not run, because it is a large object to print
# remove # to run
# predict(fit2)
# TPS plots for fitted values of some specimens
M <- Y.gpa$consensus
plotRefToTarget(M, fit2$GM$fitted[,,1], mag = 3)
plotRefToTarget(M, fit2$GM$fitted[,,20], mag = 3)
### THE FOLLOWING ILLUSTRATES SIMPLER SOLUTIONS FOR
### THE NOW DEPRECATED advanced.procD.lm FUNCTION AND
### PERFORM ANALYSES ALSO FOUND VIA THE morphol.disparity FUNCTION
### USING THE pairwise FUNCTION
# Comparison of LS means, with log(Csize) as a covariate
# Model fits
fit.null <- procD.lm(coords ~ log(Csize) + species + site, data = gdf,
iter = 999, print.progress = FALSE, turbo = TRUE)
fit.full <- procD.lm(coords ~ log(Csize) + species * site, data = gdf,
iter = 999, print.progress = FALSE, turbo = TRUE)
# ANOVA, using anova.lm.rrpp function from the RRPP package
# (replaces advanced.procD.lm)
anova(fit.null, fit.full, print.progress = FALSE)
# Pairwise tests, using pairwise function from the RRPP package
gp <- interaction(gdf$species, gdf$site)
PW <- pairwise(fit.full, groups = gp, covariate = NULL)
# Pairwise distances between means, summarized two ways
# (replaces advanced.procD.lm):
summary(PW, test.type = "dist", confidence = 0.95, stat.table = TRUE)
summary(PW, test.type = "dist", confidence = 0.95, stat.table = FALSE)
# Pairwise comaprisons of group variances, two ways
# (same as morphol.disaprity):
summary(PW, test.type = "var", confidence = 0.95, stat.table = TRUE)
summary(PW, test.type = "var", confidence = 0.95, stat.table = FALSE)
morphol.disparity(fit.full, groups = gp, iter = 999)
### Regression example
data(ratland)
rat.gpa <- gpagen(ratland) #GPA-alignment
gdf <- geomorph.data.frame(rat.gpa) # geomorph data frame is easy
# without additional input
fit <- procD.lm(coords ~ Csize, data = gdf, iter = 999, turbo = TRUE,
RRPP = TRUE, print.progress = FALSE)
summary(fit)
### Extracting objects and plotting options
# diagnostic plots
plot(fit, type = "diagnostics")
# diagnostic plots, including plotOutliers
plot(fit, type = "diagnostics", outliers = TRUE)
# PC plot rotated to major axis of fitted values
plot(fit, type = "PC", pch = 19, col = "blue")
# Regression-type plots
# Use fitted values from the model to make prediction lines
plot(fit, type = "regression",
predictor = gdf$Csize, reg.type = "RegScore",
pch = 19, col = "green")
# Uses coefficients from the model to find the projected regression
# scores
rat.plot <- plot(fit, type = "regression",
predictor = gdf$Csize, reg.type = "RegScore",
pch = 21, bg = "yellow")
# TPS grids for min and max scores in previous plot
preds <- shape.predictor(fit$GM$fitted, x = rat.plot$RegScore,
predmin = min(rat.plot$RegScore),
predmax = max(rat.plot$RegScore))
M <- rat.gpa$consensus
plotRefToTarget(M, preds$predmin, mag=2)
plotRefToTarget(M, preds$predmax, mag=2)
attributes(fit)
fit$fitted[1:3, ] # the fitted values (first three specimens)
fit$GM$fitted[,, 1:3] # the fitted values as Procrustes
# coordinates (same specimens)
### THE FOLLOWING ILLUSTRATES SIMPLER SOLUTIONS FOR
### THE NOW DEFUNCT nested.update FUNCTION USING
### THE anova GENERIC FUNCTION
data("larvalMorph")
Y.gpa <- gpagen(larvalMorph$tailcoords,
curves = larvalMorph$tail.sliders,
ProcD = TRUE, print.progress = FALSE)
gdf <- geomorph.data.frame(Y.gpa, treatment = larvalMorph$treatment,
family = larvalMorph$family)
fit <- procD.lm(coords ~ treatment/family, data = gdf, turbo = TRUE,
print.progress = FALSE, iter = 999)
anova(fit) # treatment effect not adjusted
anova(fit, error = c("treatment:family", "Residuals"))
# treatment effect updated (adjusted)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.