rin: Calculate multiple plot resemblance measures
In simba: A Collection of functions for similarity analysis of vegetation data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

The functions calculate several multiple plot similarity measures. In addition rin provides a wrapper that allows for the easy calculation of multiple plot (site) resemblance measures in neighboorhoods in an automated fashion including testing whether the found resemblance patterns are significantly different from random.

mpd(x, method="simpson", all=FALSE)

mps(x, method="whittaker", all=FALSE)

mps.ave(x, method="soerensen", all=FALSE, foc=NULL, 
	what="mean", ...)

mos.f(x, foc, d.inc=FALSE, preso=FALSE, pc = NULL)

mos.ft(x, foc = NULL, method = "soerensen", quant = FALSE, binary = TRUE, ...)

sos(x, method="mean", foc=NULL, normal.sp=TRUE, normal.pl=TRUE)

rin(veg, coord, dn, func, test = TRUE, permutations = 100, 
	permute = 2, sfno = TRUE, p.level = 0.05, ...)

`x`	Species composition data, a matrix-like object.
`method`	Method for the calculation of multiple plot resemblance. The possible choices depend on the function used and include (among others) Simpson based multiple plot dissimilarity, Sørensen based multiple plot dissimilarity, Nestedness based multiple plot dissimilarity, Whittaker's beta, additive partitioning, Harrison multiple plot similarity, Harrison multiple plot turnover, Williams multiple plot turnover, average pairwise similarity (with a similarity measure of your choice from `sim`), Diserud & Ødegaard multiple plot similarity. The methods of `mos.f` (a new group of multiple plot similarity measures) evolve from setting the arguments accordingly. For `sos` the choice is `mean` or `foc`. See details.
`all`	Logical. Depending on the function this argument has a different meaning. In `mps` and `mpd` it sets whether the results of all possible methods shall be given in the result, or only the method given in the `method` argument. Because some of the measures are just derived from others all methods are always calculated within the function when it is called and the `method` argument just triggers which to give back. In `mps.ave` it sets whether all statistics calculated (`mean` and `sd`) shall be given back or only the one specified by the `what` argument.
`foc`	Character vector with length one or an integer specifying which one is the focal plot. Four of the functions are/can be sensitive to the species composition in the focal plot (`mos.f`, `mos.ft`, `mps.ave`, `sos`). The automation function `rin` is able to automatically derive the identity of the focal plot. Just set `foc = foc` in the `func` argumemt (see example). When the functions are used stand alone either the name of the plot in parenthesis or the index of the plot within the species matrix (`x`) has to be given.
`what`	For `mps.ave`, which statistic (`mean` or `sd`) should be given back? See details.
`d.inc`	Logical. Shall all species that are within `veg` but not within the plots that make up a neighborhood be regarded when computing `mos.f`. This setting dramatically changes the behaviour of `mps.f` because it then becomes a symmetric similarity coefficient. Defaults to `FALSE` so that an asymmetric multiple plot similarity coefficient is computed. Only makes sense when `mos.f` is applied within `rin` and changes nothing otherwise.
`preso`	Logical. Shall a presence only version of `mos.f` be computed? Default is `FALSE`. See details.
`pc`	Numeric. Triggers whether pattern control is done in function `mos.f`. With pattern control (`pc`!=NULL) the similarity of the focal plot to the pooled surrounding plots is evaluated. Doing that assures that species which only occur in the focal plot are are only absent on the focal plot influence the resulting index value. With `pc` = 1 the binary variant is done, with `pc` > 1 a quantitative version is done. For details see Jurasinski et al. 2011.
`quant`	Logical. If `TRUE` use a quantitative index for calculating the similarity between the focal and the pooled surrounding plots.
`binary`	Logical. If `TRUE` pool the data for the surrounding the plots by taking the columns sums and correct the abundances on the focal plot by multiplying with the number of surrounding plots (to avoid a bias due to the area effect). If `FALSE` the data are pooled by taking the proportional columns sums and do no correction to the abundances of the focal plot.
`normal.sp`	In case of `sos` (sum of squares of species matrix, which is a measure of beta-diversity (Legendre et al. 2005)): Shall the result be normalized with respect to the number of species.
`normal.pl`	In case of `sos` (sum of squares of species matrix, which is a measure of beta-diversity (Legendre et al. 2005)): Shall the result be normalized with respect to the number of plots.
`veg`	Species composition data, a matrix-like object that is ought to be recorded in a regular array or a similar structure and that shall be divided into neighborhoods with a moving window so that each plot becomes the focal plot with a certain neighborhood of plots around for which the multiple plot resemblance measures are then calculated.
`coord`	Spatial coordinates of the field plots where the data in veg comes from. The function expects a `data.frame` with two columns with the first column giving the x (easting) coordinate and the second giving the y (northing) coordinate in UTM or the like. These coordinates are used to calculate the neighborhoods within a moving window approach.
`dn`	Distance to neighbors or neighbor definition. A positive numeric, a two value vector (also positive numeric), or a character string. In the first case it gives the distance from each sampling unit in m until which other sampling units should be seen as neighbours. In the second the two values define a ring around each plot. Plots that fall into the ring are considered as neighbors. In the third case, the character string defines the number of k nearest neighbors that should be regarded as the neighborhood. This being a character just triggers a different way to calculate the neighbors. See details.
`func`	A character string that defines the formula which shall be applied to calculate a multiple plot resemblance measure for all possible neighborhoods within an array. For instance `"mpd(x)"` to compute the Simpson multiple plot dissimilarity coefficient sensu Baselga (2010). See details.
`test`	Logical. Shall the significance of the calculated values of multiple plot resemblance be tested regarding its deviation from random expectations. Defaults to `TRUE`. See details.
`permutations`	The number of permutations run for testing the significance. Defaults to 100. And it is already slow. So test before you give much higher number of runs here.
`permute`	When testing with `rin`, how should the permutation of species to reflect random expectations be done: An integer of either 1, 2, or 3. With `1` the species matrix (`veg`) is permuted across rows. With `2` the species matrix (`veg`) is permuted across columns. With `3` the species in the focal plot are permuted (They are randomly drawn from the species pool).
`sfno`	Species from neigborhood only? Logical, that is only be set in combination with `permute` = 3. If `TRUE`, than the species are only drawn at random from the neighboorhod species sub matrix. If set to `FALSE`, the species are drawn at random from the whole species matrix `veg`.
`p.level`	Significance level below which the resemblance patterns shall be considered as significantly different from random expectations. Defaults to 0.05. Enables to give asteriks and stars in the results.
`...`	Further arguments to the workhorse functions `mpd`, `mps`, `mps.ave`, `mos.f` can be passed via ....

Several multiple plot similarity indices have been presented that cure some of the problems associated with the approaches for the calculation of compositional similarity for groups of plots by averaging pairwise similarities (Diserud and Ødegaard 2007, Baselga 2010). These indices calculate the similarity between more than two plots whilst considering the species composition on all compared plots. The resulting similarity value is true for the whole group of plots considered (called neighborhood in the following). Further, there are multiple plot similarity coefficients that are determined by the species composition on a reference plot (named focal plot in the following). All of these, can be calculated with the functions described in this help file. See vignette for an overview table. Further, the function rin takes all of them and provides a framework for applying the measures to an array of plots to calculate multiple plot resemblance in neighborhoods (Jurasinski et al. submitted).

mps stands for multiple plot similarity, whereas mpd stands for multiple plot dissimilarity and mos stands for measure of singularity; the letters behind the "." further specifiy the class of measures that can be calculated with the respective function.

mps.ave calculates average multiple plot (dis-)similarities from pairwise (dis-)similarity calculations between the plots in the dataset or in the specified neighborhood. It has several options. With setting the foc argument different from NULL, only the pairwise (dis-)similarities between the specified focal plot and all others in the dataset (neighborhood) are taken to calculate the mean and sd from. When the specified focal plot is not existing, the function will issue a warning and stop. When run with defaults (foc = NULL), all pairwise similarities between the plots in the neighborhood (dataset) are considered. Any resemblance measure available via sim or sim.yo can be taken as base for calculating the average (dis-)similarity and its spread.

mps calculates multiple plot (dis-)similarities that are either derived from other approaches to beta-diversity calculation (Whittaker's beta, additive partitioning), or have been around for quite a while (Harrison multiple plot dissimilarity, Harrison multiple plot turnover, Williams multiple plot turnover). None of these considers the actual species composition on each of the compared plots. The following methods are available (n = number of plots, S = number of species, γ = gamma diversity (S_n), α = alpha diversity (S_i)):

whittaker: Calculates Whittaker's beta (multiplicative partitioning, Whittaker 1960) β = γ/mean(α).

inverse.whittaker: Inverse Whittaker's beta (multiplicative partitioning). Scales between 1/n (when the considered plots do not share any species at all) and 1 (when all plots share the same species)

additive: Additive partitioning. Following Lande (1996) and keeping it with α = species number, the additive beta-component of the neighborhood (in the rin-case or the complete dataset in the mps-case) is calculated.

harrison: Harrison (1992) multiple plot dissimilarity. A transformation of Whittaker's beta to be bounded between 0 and 1 ((β_W-1)/(n-1).

diserud: Diserud & Ødegaard (2007) derived this from the pairwise Sørensen similarity measure. However, as Baselga highlights, this can also be derived from Whittaker's beta (n - β_W)/(n-1) and is basically the same as Harrisons multiple plot dissimilarity but expressed as a similarity.

harrison.turnover: ((γ/max(α))-1)/(n-1) (Harrison et al. 1992).

williams: 1 - max(α)/γ (Williams 1996).

mpd calculates multiple plot dissimilarity indices that have been suggested by Baselga (2010). The following methods are available (The implementation differs slightly from the one offered by Baselga in the electronic appendix of his paper and is computationally more efficient):

simpson: mps.Sim in the following. Baselga et al. (2007) derive this multiple plot dissimilarity coefficient directly from the pairwise Simpson dissimilarity index by applying it to a group of plots/sites. The authors emphasize, that this coefficient is independent of patterns of richness and peforms better than the Diserud & Ødegaard cofficient in cases of unequal species numbers between plots, because it discriminates between situations in which shared species are distributed evenly among plots or concentrated in a few pairs of sites.

sorensen: mps.Sor in the following. By building multiple site equivalents of the matching components (a, b, c) Baselga (2010) derives a Sørensen based measure of multiple plot dissimilarity.

nestedness: mps.nes in the following. Because the Sørensen based multiple plot dissimilarity coefficient accounts for both spatial turnover and nestedness whilst the Simpson based multiple plot dissimilarity coefficient accounts only for spatial turnover, it is possible to calculate the multiple plot similarity that is completely due to nestedness by calculating mps.Sor - mps.Sim.

mos.f calculates a focal measure of singularity. In contrast to the other functions the different outcomes can be triggered by setting the further arguments accordingly.

The indices of mos.f change depending on the vegetation composition of the focal plot. The value is therefore true and valid only for the comparison of the focal plot with the surrounding plots. Not the similarity in the neighborhood, but the similarity of the focal plot to all others in the neighborhood is calculated. The calculation is based on the occurrences and non-occurrences of species on the compared plots with the species composition on the focal plot determining which of the two is to be used for which species: For all species that occur on the focal plot the proportional frequencies of occurrence in the neighborhood are summed up. For species that do not occur on the focal plot the proportional frequencies of non-occurrence in the neighborhood are summed up.

sum(f_oi)+sum(f_ni)

with f_oi = proportional frequency of occurrences of the ith species on the compared plots, only carried out for species that do occur on the focal plot, f_nj = frequency of non-occurrences of the jth species on the compared plots, only carried out for species that do not occur on the focal plot). The frequencies are calculated against the total numbers of cells in the species matrix and are therefore 'proportional frequencies' (in analogy to 'proportional abundances' as in diversity indices like Shannon or Simpson). Thus, if all compared plots have an identical species composition, the resulting value of the multi-plot similarity coefficient is 1. In this rather hypothetical case the species presence absence matrix would be filled with ones only. This is the null model against which the 'proportional frequencies' are calculated. Therefore, the coefficient can be interpreted as a measure of deviation from complete uniformity. There are three versions.

preso=TRUE: In this case a presence only version is calculated (mos.fpo). Therefore the second term is skipped and the formula simplifies to sum(f_oi). This very much glorifies the species composition on the focal plot and evaluates whether the surrounding plots in the neighborhood feature the same species.

d.inc=FALSE: When the d.inc argument is set to FALSE, only the species in the neighborhood build the basis against which the 'proportional frequencies' are calculated. This is the default index mos.f.

When run with defaults (preso = FALSE) and (d.inc = TRUE), a symmetric focal measure of siingularity (mos.fs) results. It is definetely meant for use in the context of rin. The 'proportional frequencies' are calculated against the whole species matrix. Thus, the index is a symmetric similarity coefficient sensu Legendre & Legendre 1998 that considers species that do not occur on the compared plots but in the whole data set. Therefore, it is more appropriate for biodiversity or conservation studies and not so much for the investigation of ecological relationships. However, it can be interpreted as an 'ordination on the spot': By calculating mos.fs for a focal plot against its surrounding plots its position along the main gradient according to its species composition is estimated immediately because the species composition in the rest of the data set is incorporated in the construction of the proportional frequencies of the species. Because of this, mos.fs can be interpreted as a measure of deviation from complete unity in species composition. When the neighborhood is increased to the full data set, mos.f and mos.fs converge.

mos.ft calculates the singularity of a focal plot with respect to the pooled species composition on surrounding plots. Many binary or quantitave similarity indices can be used (all those that are available via sim and vegdist).

sos calculates the sum of squares of a species matrix. Legendre et al. (2005) show, that this is a measure of beta-diversity. However, when you don't normalize against the number of species and/or plots the obtained values can hardly be compared across data sets (or neighborhoods). Therefore, its advisable to run this with defaults (normal.sp = TRUE and normal.pl = TRUE). For experiments, method can be set to "foc". Then, not the deviation from the mean of the species occurence across plots builds the basis, but the deviation from the situation on a focal plot. This makes it somewhat related to the mos.f-stuff.

rin applies the other functions to an array of plots. For each plot a neighborhood is constructed via the dn argument and the specified index is calculated for all plots and neighborhoods. The function to be calculated is specified simply by the func argument. For instance, with func = "mpd(x, method='sorensen')" the function rin calculates the Sørensen multiple plot dissimilarity for each plot and its neighborhood in an array. The functions that need the identity of a focal plot (mps.ave, mos.f, and mos.ft) automatically derive the focal plots. However, to trigger this it has to be specified within the func argument: func = "mos.f(x, foc = foc)".

The functions mpd, mps, mps.ave, mos.f, and mos.ft return a single value with the calculated index (according to the method argument, or to the other arguments). When all is set to TRUE, mps.ave returns two values (the average and the standard deviation of the pairwise similarities in the neighborhood), whereas mpd and mps return a named numerical vector with the values for all indices that can be calculated with the respective function.

rin gives back a table (data.frames), that reports several values for each plot in the dataset per row. The first three columns are always returned. In case test = TRUE, three more columns with information on the significance test are returned.

`n.plots`	Number of plots that make up the neighborhood.
`n.spec`	Number of species that occur in the neighborhood.
`dis`	Value of the calculated (dis)similarity index per plot.
`p.val`	p value of the permutation test. According to the `permute` argument the data set is shuffled. The random data is subjected to the same calculations `permutations` times. The original value of multiple plot similarity is compared to the distribution of random values to obtain this p.
`sig`	Significance flag. Just a translation of the p value into a significance flag. There are only two possibilities: "*" value is significantly different from random, "ns" value is not significantly different from random.
`sig.sign`	The sign of the significance value. The tail which is tested is determined by the relation of the multiple plot similarity value to the average multiple plot similarity value of the random test distribution. Thus, the sign shows whether the multiple plot similarity is significantly higher than can be expected from random expectations (`+`) of lower (`-`).

rin is not optimized and could perhaps profit from some C code. So when test = TRUE it takes a while because of the permutations.

Gerald Jurasinski gerald.jurasinski@uni-rostock.de, with contributions by Vroni Retzer vroni.retzer@gmx.de

Baselga A (2007) A multiple–site similarity measure independent of richness. Biology Letters 3: 642–645.

Baselga A (2010) Partitioning the turnover and nestedness components of beta diversity. Global Ecology and Biogeography 19: 134–143.

Diserud OH, Ødegaard F (2007) A multiple–site similarity measure. Biology Letters 3: 20–22.

Harrison S, Ross SJ, Lawton JH (1992) Beta-diversity on geographic gradients in Britain. Journal of Animal Ecology 61: 151–158.

Jurasinski G, Jentsch A, Retzer V, Beierkuhnlein C (2011) Assessing gradients in species composition with multiple plot similarity coefficients. Ecography 34: 1-16.

Lande R (1996) Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos 76: 5–13.

Legendre P, Borcard D, Peres-Neto P (2005) Analyzing beta diversity: partitioning the spatial variation of community composition data. Ecological Monographs 75: 435–450.

Williams PH (1996) Mapping variations in the strength and breadth of biogeographic transition zones using species turnover. Proceedings of the Royal Society of London Series B–Biological Sciences 263: 579–588.

Whittaker RH (1960) Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs 30: 279–338.

sim, vegdist, dsvdis for pairwise similarity measures.

## Not run: 
# load the data that comes with the package
data(abis)

# calculate a multiple plot similarity index 
# (Sørensen sensu Baselga) for whole dataset
mpd(abis.spec, method="sorensen")

# calculate a multiple plot similarity index
# (Sørensen sensu Baselga) for each plot and 
# its neighborhood
abis.mpd.so <- rin(abis.spec, coord=abis.env[,1:2], 
dn=100, func="mpd(x, method='sorensen')")

# plot the grid of plots and show the calculated 
# multiple plot dissimilarity value through the 
# size of the symbol and the sign of the value
# with a superimposed "+" or "-"
with(abis.mpd.so , {
plot(abis.env[,1:2], cex=symbol.size(dis), pch=c(21,1)[sig], 
	bg="grey50", xlab="", ylab="")
subs <- sig == "*"
points(abis.env[subs,1:2], pch=c("-", "+")[sig.prefix[subs]])
})

# calculate a multiple plot similarity index
# that takes care of the species composition
# on the focal plot
rin(abis.spec, coord=abis.env[,1:2], test=FALSE,
dn=100, func="mos.f(x, foc=foc)")

## End(Not run)