rcqo: Constrained Quadratic Ordination

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/family.rcqo.R

Description

Random generation for constrained quadratic ordination (CQO).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
rcqo(n, p, S, Rank = 1,
     family = c("poisson", "negbinomial", "binomial-poisson",
                "Binomial-negbinomial", "ordinal-poisson",
                "Ordinal-negbinomial", "gamma2"),
     eq.maximums = FALSE, eq.tolerances = TRUE, es.optimums = FALSE,
     lo.abundance = if (eq.maximums) hi.abundance else 10,
     hi.abundance = 100, sd.latvar = head(1.5/2^(0:3), Rank),
     sd.optimums = ifelse(es.optimums, 1.5/Rank, 1) *
                       ifelse(scale.latvar, sd.latvar, 1),
     sd.tolerances = 0.25, Kvector = 1, Shape = 1,
     sqrt.arg = FALSE, log.arg = FALSE, rhox = 0.5, breaks = 4,
     seed = NULL, optimums1.arg = NULL, Crow1positive = TRUE,
     xmat = NULL, scale.latvar = TRUE)

Arguments

n

Number of sites. It is denoted by n below.

p

Number of environmental variables, including an intercept term. It is denoted by p below. Must be no less than 1+R in value.

S

Number of species. It is denoted by S below.

Rank

The rank or the number of latent variables or true dimension of the data on the reduced space. This must be either 1, 2, 3 or 4. It is denoted by R.

family

What type of species data is to be returned. The first choice is the default. If binomial then a 0 means absence and 1 means presence. If ordinal then the breaks argument is passed into the breaks argument of cut. Note that either the Poisson or negative binomial distributions are used to generate binomial and ordinal data, and that an upper-case choice is used for the negative binomial distribution (this makes it easier for the user). If "gamma2" then this is the 2-parameter gamma distribution.

eq.maximums

Logical. Does each species have the same maximum? See arguments lo.abundance and hi.abundance.

eq.tolerances

Logical. Does each species have the same tolerance? If TRUE then the common value is 1 along every latent variable, i.e., all species' tolerance matrices are the order-R identity matrix.

es.optimums

Logical. Do the species have equally spaced optimums? If TRUE then the quantity S^(1/R) must be an integer with value 2 or more. That is, there has to be an appropriate number of species in total. This is so that a grid of optimum values is possible in R-dimensional latent variable space in order to place the species' optimums. Also see the argument sd.tolerances.

lo.abundance, hi.abundance

Numeric. These are recycled to a vector of length S. The species have a maximum between lo.abundance and hi.abundance. That is, at their optimal environment, the mean abundance of each species is between the two componentwise values. If eq.maximums is TRUE then lo.abundance and hi.abundance must have the same values. If eq.maximums is FALSE then the logarithm of the maximums are uniformly distributed between log(lo.abundance) and log(hi.abundance).

sd.latvar

Numeric, of length R (recycled if necessary). Site scores along each latent variable have these standard deviation values. This must be a decreasing sequence of values because the first ordination axis contains the greatest spread of the species' site scores, followed by the second axis, followed by the third axis, etc.

sd.optimums

Numeric, of length R (recycled if necessary). If es.optimums = FALSE then, for the rth latent variable axis, the optimums of the species are generated from a normal distribution centered about 0. If es.optimums = TRUE then the S optimums are equally spaced about 0 along every latent variable axis. Regardless of the value of es.optimums, the optimums are then scaled to give standard deviation sd.optimums[r].

sd.tolerances

Logical. If eq.tolerances = FALSE then, for the rth latent variable, the species' tolerances are chosen from a normal distribution with mean 1 and standard deviation sd.tolerances[r]. However, the first species y1 has its tolerance matrix set equal to the order-R identity matrix. All tolerance matrices for all species are diagonal in this function. This argument is ignored if eq.tolerances is TRUE, otherwise it is recycled to length R if necessary.

Kvector

A vector of positive k values (recycled to length S if necessary) for the negative binomial distribution (see negbinomial for details). Note that a natural default value does not exist, however the default value here is probably a realistic one, and that for large values of μ one has Var(Y) = mu^2 / k approximately.

Shape

A vector of positive lambda values (recycled to length S if necessary) for the 2-parameter gamma distribution (see gamma2 for details). Note that a natural default value does not exist, however the default value here is probably a realistic one, and that Var(Y) = mu^2 / lambda.

sqrt.arg

Logical. Take the square-root of the negative binomial counts? Assigning sqrt.arg = TRUE when family="negbinomial" means that the resulting species data can be considered very crudely to be approximately Poisson distributed. They will not integers in general but much easier (less numerical problems) to estimate using something like cqo(..., family="poissonff").

log.arg

Logical. Take the logarithm of the gamma random variates? Assigning log.arg = TRUE when family="gamma2" means that the resulting species data can be considered very crudely to be approximately Gaussian distributed about its (quadratic) mean.

rhox

Numeric, less than 1 in absolute value. The correlation between the environmental variables. The correlation matrix is a matrix of 1's along the diagonal and rhox in the off-diagonals. Note that each environmental variable is normally distributed with mean 0. The standard deviation of each environmental variable is chosen so that the site scores have the determined standard deviation, as given by argument sd.latvar.

breaks

If family is assigned an ordinal value then this argument is used to define the cutpoints. It is fed into the breaks argument of cut.

seed

If given, it is passed into set.seed. This argument can be used to obtain reproducible results. If set, the value is saved as the "seed" attribute of the returned value. The default will not change the random generator state, and return .Random.seed as "seed" attribute.

optimums1.arg

If assigned and Rank = 1 then these are the explicity optimums. Recycled to length S.

Crow1positive

See qrrvglm.control for details.

xmat

The n by p-1 environmental matrix can be inputted.

scale.latvar

Logical. If FALSE the argument sd.latvar is ignored and no scaling of the latent variable values is performed.

Details

This function generates data coming from a constrained quadratic ordination (CQO) model. In particular, data coming from a species packing model can be generated with this function. The species packing model states that species have equal tolerances, equal maximums, and optimums which are uniformly distributed over the latent variable space. This can be achieved by assigning the arguments es.optimums = TRUE, eq.maximums = TRUE, eq.tolerances = TRUE.

At present, the Poisson and negative binomial abundances are generated first using lo.abundance and hi.abundance, and if family is binomial or ordinal then it is converted into these forms.

In CQO theory the n by p matrix X is partitioned into two parts X_1 and X_2. The matrix X_2 contains the ‘real’ environmental variables whereas the variables in X_1 are just for adjustment purposes; they contain the intercept terms and other variables that one wants to adjust for when (primarily) looking at the variables in X_2. This function has X_1 only being a matrix of ones, i.e., containing an intercept only.

Value

A n by p-1+S data frame with components and attributes. In the following the attributes are labelled with double quotes.

x2, x3, x4, ..., xp

The environmental variables. This makes up the n by p-1 X_2 matrix. Note that x1 is not present; it is effectively a vector of ones since it corresponds to an intercept term when cqo is applied to the data.

y1, y2, x3, ..., yS

The species data. This makes up the n by S matrix Y. This will be of the form described by the argument family.

"concoefficients"

The p-1 by R matrix of constrained coefficients (or canonical coefficients). These are also known as weights or loadings.

"formula"

The formula involving the species and environmental variable names. This can be used directly in the formula argument of cqo.

"log.maximums"

The S-vector of species' maximums, on a log scale. These are uniformly distributed between log(lo.abundance) and log(hi.abundance).

"latvar"

The n by R matrix of site scores. Each successive column (latent variable) has sample standard deviation equal to successive values of sd.latvar.

"eta"

The linear/additive predictor value.

"optimums"

The S by R matrix of species' optimums.

"tolerances"

The S by R matrix of species' tolerances. These are the square root of the diagonal elements of the tolerance matrices (recall that all tolerance matrices are restricted to being diagonal in this function).

Other attributes are "break", "family", "Rank", "lo.abundance", "hi.abundance", "eq.tolerances", "eq.maximums", "seed" as used.

Note

This function is under development and is not finished yet. There may be a few bugs.

Yet to do: add an argument that allows absences to be equal to the first level if ordinal data is requested.

Author(s)

T. W. Yee

References

Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.

Yee, T. W. (2006). Constrained additive ordination. Ecology, 87, 203–213.

ter Braak, C. J. F. and Prentice, I. C. (1988). A theory of gradient analysis. Advances in Ecological Research, 18, 271–317.

See Also

cqo, qrrvglm.control, cut, binomialff, poissonff, negbinomial, gamma2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
# Example 1: Species packing model:
n <- 100; p <- 5; S <- 5
mydata <- rcqo(n, p, S, es.opt = TRUE, eq.max = TRUE)
names(mydata)
(myform <- attr(mydata, "formula"))
fit <- cqo(myform, poissonff, mydata, Bestof = 3)  # eq.tol = TRUE
matplot(attr(mydata, "latvar"), mydata[,-(1:(p-1))], col = 1:S)
persp(fit, col = 1:S, add = TRUE)
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)  # The same plot as above

# Compare the fitted model with the 'truth'
concoef(fit)  # The fitted model
attr(mydata, "concoefficients")  # The 'truth'

c(apply(attr(mydata, "latvar"), 2, sd),
  apply(latvar(fit), 2, sd))  # Both values should be approx equal


# Example 2: negative binomial data fitted using a Poisson model:
n <- 200; p <- 5; S <- 5
mydata <- rcqo(n, p, S, fam = "negbin", sqrt = TRUE)
myform <- attr(mydata, "formula")
fit <- cqo(myform, fam = poissonff, dat = mydata)  # I.tol = TRUE,
lvplot(fit, lcol = 1:S, y = TRUE, pcol = 1:S)
# Compare the fitted model with the 'truth'
concoef(fit)  # The fitted model
attr(mydata, "concoefficients")  # The 'truth'

## End(Not run)

Example output

Loading required package: stats4
Loading required package: splines
[1] "x2" "x3" "x4" "x5" "y1" "y2" "y3" "y4" "y5"
cbind(y1, y2, y3, y4, y5) ~ x2 + x3 + x4 + x5
<environment: 0x25088b0>

========================= Fitting model 1 =========================

Obtaining initial values

Using initial values
         x2   x3    x4    x5
latvar 1.56 1.23 0.137 -1.12

Using BFGS algorithm
initial  value 441.294463 
iter  10 value 409.828625
final  value 409.816840 
converged

BFGS using optim(): 
Objective = 409.8168 
Parameters (= c(C)) =  0.7958024 0.6156083 0.05702442 -0.5895017

Number of function evaluations = 182 


========================= Fitting model 2 =========================

Obtaining initial values

Using initial values
         x2   x3    x4    x5
latvar 1.58 1.19 0.177 -1.15

Using BFGS algorithm
initial  value 443.385101 
iter  10 value 409.882812
final  value 409.815912 
converged

BFGS using optim(): 
Objective = 409.8159 
Parameters (= c(C)) =  0.7958039 0.6155422 0.05703384 -0.5893079

Number of function evaluations = 174 


========================= Fitting model 3 =========================

Obtaining initial values

Using initial values
         x2  x3    x4    x5
latvar 1.57 1.2 0.192 -1.19

Using BFGS algorithm
initial  value 437.442335 
iter  10 value 409.816443
final  value 409.815760 
converged

BFGS using optim(): 
Objective = 409.8158 
Parameters (= c(C)) =  0.7957827 0.6155815 0.0570064 -0.5892947

Number of function evaluations = 124 

        latvar
x2  1.20375634
x3  0.93117154
x4  0.08623186
x5 -0.89140829
       latvar
x2  1.1947485
x3  0.9356862
x4  0.0651955
x5 -0.8788177
         latvar 
1.50000 1.51267 

========================= Fitting model 1 =========================

Obtaining initial values

Using initial values
          x2   x3     x4    x5
latvar 0.637 1.52 -0.336 0.446

Using BFGS algorithm
initial  value 1020.744325 
iter  10 value 1014.137942
final  value 1014.132091 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2834194 0.7370396 -0.1134025 0.247253

Number of function evaluations = 70 


========================= Fitting model 2 =========================

Obtaining initial values

Using initial values
          x2  x3     x4    x5
latvar 0.636 1.5 -0.355 0.484

Using BFGS algorithm
initial  value 1020.822555 
iter  10 value 1014.133583
final  value 1014.131658 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2837208 0.7367099 -0.1130147 0.2470761

Number of function evaluations = 107 


========================= Fitting model 3 =========================

Obtaining initial values

Using initial values
          x2   x3    x4    x5
latvar 0.617 1.45 -0.42 0.617

Using BFGS algorithm
initial  value 1031.278257 
iter  10 value 1014.131755
iter  10 value 1014.131755
iter  10 value 1014.131755
final  value 1014.131755 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2838962 0.736647 -0.1133559 0.2472547

Number of function evaluations = 97 


========================= Fitting model 4 =========================

Obtaining initial values

Using initial values
          x2   x3     x4    x5
latvar 0.637 1.52 -0.336 0.446

Using BFGS algorithm
initial  value 1020.744360 
iter  10 value 1014.137942
final  value 1014.132091 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2834194 0.7370396 -0.1134025 0.247253

Number of function evaluations = 65 


========================= Fitting model 5 =========================

Obtaining initial values

Using initial values
          x2  x3     x4    x5
latvar 0.586 1.5 -0.289 0.482

Using BFGS algorithm
initial  value 1015.863326 
iter  10 value 1014.131938
final  value 1014.131648 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2837854 0.7366741 -0.1130643 0.2470977

Number of function evaluations = 107 


========================= Fitting model 6 =========================

Obtaining initial values

Using initial values
         x2   x3     x4    x5
latvar 0.64 1.47 -0.381 0.543

Using BFGS algorithm
initial  value 1023.616403 
iter  10 value 1014.132394
iter  10 value 1014.132394
final  value 1014.131668 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2837928 0.7367353 -0.1131283 0.2470575

Number of function evaluations = 108 


========================= Fitting model 7 =========================

Obtaining initial values

Using initial values
          x2   x3     x4   x5
latvar 0.658 1.47 -0.381 0.52

Using BFGS algorithm
initial  value 1023.892265 
iter  10 value 1014.132759
final  value 1014.131631 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2837333 0.7366672 -0.1130931 0.2471853

Number of function evaluations = 155 


========================= Fitting model 8 =========================

Obtaining initial values

Using initial values
          x2   x3     x4    x5
latvar 0.587 1.53 -0.285 0.445

Using BFGS algorithm
initial  value 1017.142354 
iter  10 value 1014.132180
iter  10 value 1014.132174
iter  10 value 1014.132174
final  value 1014.132174 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2839864 0.7368604 -0.1136613 0.2471132

Number of function evaluations = 81 


========================= Fitting model 9 =========================

Obtaining initial values

Using initial values
          x2   x3     x4    x5
latvar 0.602 1.51 -0.323 0.487

Using BFGS algorithm
initial  value 1017.822736 
iter  10 value 1014.132324
final  value 1014.131833 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2836218 0.7368244 -0.1133946 0.2473302

Number of function evaluations = 80 


========================= Fitting model 10 =========================

Obtaining initial values

Using initial values
          x2   x3     x4    x5
latvar 0.587 1.53 -0.285 0.445

Using BFGS algorithm
initial  value 1017.142344 
iter  10 value 1014.132180
iter  10 value 1014.132174
iter  10 value 1014.132174
final  value 1014.132174 
converged

BFGS using optim(): 
Objective = 1014.132 
Parameters (= c(C)) =  0.2839864 0.7368604 -0.1136613 0.2471132

Number of function evaluations = 82 

       latvar
x2  0.3539174
x3  0.9188889
x4 -0.1410678
x5  0.3083290
        latvar
x2  0.39546517
x3  1.08591532
x4 -0.09926116
x5  0.37025406

VGAM documentation built on Jan. 16, 2021, 5:21 p.m.