This document provides a short introduction to the Price equation partition, useful for disentangling links between changes in diversity and ecosystem function between pairs of communities. Additionally, it provides examples of useful R functions developed for this workshop to make applying the Price equation easy (hopefully). Finally, the end of the document walks through an example analysis of a larger data set.

Eventually, these tools and documentation will be wrapped into an R package.


Getting Started

Load Price tools

Many of the R functions designed for this workshop can be found in the Price_FUNCTIONS_050916.R file, or a more recent version. You can open and run this file directly, or use the source() command.

to.dropbox<-c('/Users/colin/Dropbox/')    # select your own path to dropbox as needed
pth<-paste(to.dropbox,
           '/sCAFE_SharedFolder/sCAFE_R_Code/development/Price_FUNCTIONS_050916.R',sep='')
source(pth)

Setting up data

To calculate the Price Partition, we need data on the identity and function of species from two communities, X and Y. We can then use the function price.part() to obtain the components explaining change in function between communities. First, however, we need to make sure that our community data is in the right format.

Load example data for demoing tools:

pth<-paste(to.dropbox,"/sCAFE_SharedFolder/sCAFE_Documentation/example_data.csv",sep='')
price.data <- read.csv(pth)

We can either load a data file that is already formatted for price.part(), or we can load data in a format more typical of empirical data sets, and run that data through the function data.setup() to format it correctly. The desired final format has a row for each unique species that occurs in one or both communities. Columns include the species ID, the function of each species in X and Y, and three book-keeping columns that track whether each species appears in both X and Y, or X, or Y. Species that do not appear in a community are listed as having 0 function in that community.

Method A:

One data set with three columns.

head(price.data)

comm <- data.setup(list(price.data))
head(comm)

Method B:

dataX <- price.data[price.data$biomassX !=0, c(1,2)]
dataY <- price.data[price.data$biomassY != 0, c(1,3)]

Two data sets with two columns.

head(dataX)
head(dataY)

comm <- data.setup(list(dataX,dataY))
head(comm)

Calculating Price equation partition

After we have taken data from two communities, X and Y, and created a properly formatted data object (either by hand, or by using the data.setup() function), we can use the price.part() function to compute the price equation partition for these communities.

price.part(comm)

Following Fox & Kerr 2012, the output gives us values for the Price equation partition.

It also provides terms that quantify ecosystem change from the CAFE and BEF perspectives.

And additional values.


Exploring some real data

Example Data: Effects of nitrate fertilization on plant community composition and function

This example uses data from Cedar Creek on plant community cover and biomass, as a function of different levels of nitrate fertilization.

## Load data
pth <- paste(to.dropbox,"/sCAFE_SharedFolder/sCAFE_Data/Original/CedarCreek/harpole/e001_Plant aboveground biomass data.txt",sep='')
cdr <- read.table(pth,sep="\t", skip=135, header=T,nrows=46756)
head(cdr)

## Clean up data:

# Drop litter & other oddballs:
cdr <- cdr %>% filter(Species!="Miscellaneous litter")
cdr <- cdr %>% filter(!(Species %in% c("Miscellaneous grasses",
                                       "Miscellaneous herbs","Mosses & lichens"))) 

# Focus on just year 1992
cdr92 <-  cdr %>% filter(Year==1992)

# Just field D
cdr92D <- cdr92 %>% filter(Field=="D")

cdr92D$Species <- as.character(cdr92D$Species)

head(cdr92D)

Single Price equation comparison

This data set provides many many different communities that we hope to compare. For now, we will just pull out two of them, corresponding to two replicates within the same site, with the same combination of treatments.

comX <- cdr92D %>% filter(Plot==1,NTrt==1)
comY <- cdr92D %>% filter(Plot==17,NTrt==1)

# Only need to keep the species ID and function columns:
comX <- comX[,c('Species','Biomass')]
comY <- comY[,c('Species','Biomass')]

# Set up the data:
comm <- data.setup(list(comX,comY))
head(comm)

Great! Now we can run a Price equation partition on these two communities:

price.part(comm)

Again, we obtain the 5 components outlined in Fox & Kerr 2012. To provide some interpretation, we are trying to understand how changes in the presence and function of species influences differences in total function between community X and Y.

Multiple, pairwise Price comparisons

This is great, but if we have to repeat this process over and over again for the thousands and thousands of possible pairwise comparisons between communities and treatments in this data set, it will get very tedious and confusing. In the next section, I want to introduce you to some tools that will automate this process.

The first step is to take our entire data set and provide information on the set of columns that are used to group species observations into a single community data set. In this case, this includes our treatment variables (NTrt) as well as columns indicating sampling structure (Plot). In a different data set, this might also include a time variable, like sampling date.

First we will look at the high nitrate addition treatment. This requires subsetting our data, which you can do in a bunch of ways (here I am making use of tools from the dplyr package).

Calculating pairwise comparisons

For this example, we will return to data from Cedar Creek on the composition and function of plant communities. Before making calculations, we need to organize and process our data. Initially we will focus just on the most extreme nutrient addition, 28 g/m2/yr.

data1 <- cdr92D %>% filter(NAdd %in% c(0,27.2))

Another step we need to take is to identify the grouping variable(s) that organize our data, usually based on the treatment and replication structures in our data set. This is necessary for allowing our code to identify the unique communities that we want to compare.

# Define a set of grouping and treatment variables and associate them with the data object:
group.vars <- c('Plot')
treat.vars <- c('NTrt','NAdd')
grouped.data1 <- data1 %>% group_by_(.dots=c(group.vars,treat.vars))

Having grouped our data, we can use a function called pairwise.price() which will take our data frame and compute the Price equation paritions for all pairwise combinations of communities identified by our grouping variables. When we call the pairwise.price() function, we have to provide it with our grouped data, and also indicate which columns in the grouped data set contain the species IDs (species="Species") and the ecosystem function we are examining (func="Biomass").

CAUTION - This function can take a while to run, as the number of pairwise comparisons can be quite large. It is worth pausing to think a moment before running this function so you are aware of the size of the computational task you are setting for your computer (and maybe whether you have time to go have a coffee).

# Calculate pairwise comparisons of sampled communities using the price equation.
#   -  NOTE: Self-comparisons of each community to itself are automatically excluded
res1<- pairwise.price(grouped.data1,species="Species",func="Biomass")
head(res1)

This is pretty awesome. For each of our treatment, site, and replicate combinations, we now have the 5-part Price equation partition, as well as combinations of these terms (SL, SG, SR, CE). There are also additional columns keeping track of the function and richness of the baseline and comparison communities, and the number of shared species between communities.

Take a look at the second line of the res1 data frame. It should look pretty familiar, because it is the set of results we obtained from our single Price equation comparison in the previous section. But now we have all of the possible pairwise comparisons, which will allow us to disentangle treatment effects from background noise caused by sampling error or process error. The next section will explore ways of visualizing and analyzing this data set of pairwise Price comparisons.

Visually comparing Price partition results

After manipulating these results a little bit, we can use a set of new graphing functions to explore visually and statistically how different decompositions of changes in ecosystem function (BEF, CAFE, Price) respond to the imposed nutrient enrichment treatment.

Data setup

# Create a single column keeping track of the paired set of enrichment treatments & other grouping variables:
pp1<-res1
pp1<-group.columns(pp1,gps=c(group.vars,treat.vars),drop=T)
head(pp1)

Depending on the analyses that we are interested in, and what we want to test, we do not need to examine all pairs of comparisons. We can subset the results of pairwise.price() to remove unneeded comparisons. For example, in this analysis, we are interested in the control-control comparisons (here, comparisions of communities where NAdd=0). We also want to retain the control-treatment comparisons (here, comparisions of communities where NAdd=(0 and 27.2)), but not the treatment-control comparisons, because we want to be able to identify directional effects of imposing a disturbance treatment.

# Subset pairwise results:
pp1<-pp1[pp1$NAdd %in% c('0 0','0 27.2'),]

# Update factor labeling for Disturbance treatment (helps later with plotting)
pp1$NAdd<-factor(as.character(pp1$NAdd),levels=c('0 0','0 27.2'))
head(as.data.frame(pp1))

# Stash data on distinct sets of comparisons separately (to aid plotting)
dat1<-pp1[pp1$NAdd %in% c('0 27.2'),]
dat1.ctrl<-pp1[pp1$NAdd %in% c('0 0'),]

CAFE-style vector plots

We can plot the result of the pairwise Price comparisons using the leap.zip() function, which is essentially a wrapper for a complex ggplot construction. It requires inputting a data set of comparisons, resulting from the pairwise.price() computation we ran earlier. This leap.zip() function can accept a large number of options, which give the user control over the appearance and content of the final plot. Several examples follow.

First, we can look at the CAFE-style decomposition of changes in ecosystem function (by specifying type='cafe').

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)")

Additional options allow us to provide plot titles, change the size of the plotting window, and display the mean vectors associated with each component as well as associated error bars.

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",
              xlim=c(3,18),ylim=c(-100,700))

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",
              xlim=c(3,18),ylim=c(-100,700),error.bars=T)

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",
              xlim=c(3,18),ylim=c(-100,700),error.bars=T,vectors=T)

Note that by default, the leap.zig() function standardizes all changes in function by the total function of the baseline communities. As a result, all y-axis values can be viewed as %changes in ecosystem function relative to the baseline community. Alternatively, we can avoid this choice of standardization by:

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",standardize = FALSE)

These plots can get quite busy. Sometimes it will be helpful to make similar plots, but display only the mean vectors across pairwise comparisons:

leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",
              xlim=c(3,18),ylim=c(-100,700),raw.points=F,error.bars=T,vectors=T)

We can also get a sense of how this plot of control-treatment pairs looks compared with control-control pairs. This requires saving the graphical vector plots that result from multiple treatments. Then we can draw plots side-by-side:

s1 <- leap.zig(dat1.ctrl,type='cafe',main="Enrichment \n(0 vs. 0)",
             xlim=c(3,18),ylim=c(-100,700),error.bars=T,
             vectors=T,raw.points = F,legend=FALSE)
s2 <- leap.zig(dat1,type='cafe',main="Enrichment \n(0 vs. 27.2)",
             xlim=c(3,18),ylim=c(-100,700),error.bars=T,
             vectors=T,raw.points = F,legend=FALSE)
grid.arrange(s1,s2,nrow=1)

BEF-style vector plots

Analogous sets of plots can be produced for the BEF decomposition of changes in ecosystem function.

leap.zig(dat1,type='bef',main="Enrichment \n(0 vs. 27.2)",
         xlim=c(3,18),ylim=c(-100,700),error.bars=T,vectors=T)

With vectors only:

leap.zig(dat1,type='bef',main="Enrichment \n(0 vs. 27.2)",
         xlim=c(3,18),ylim=c(-100,700),raw.points=F,error.bars=T,vectors=T)

5-part Price vector plots

Analogous sets of plots can be produced for the full 5-part Price decomposition of changes in ecosystem function.

leap.zig(dat1,type='price',main="Enrichment \n(0 vs. 27.2)",
         xlim=c(3,18),ylim=c(-100,700),error.bars=T,vectors=T)

Or just the vectors:

leap.zig(dat1,type='price',main="Enrichment \n(0 vs. 27.2)",
         xlim=c(5,15),ylim=c(-50,520),raw.points=F,error.bars=F,vectors=T)

Visually comparing treatments

We can also compare the results of multiple treatments, in a variety of ways.

First, CAFE vs. BEF comparisons can be made easily using type='both' in function leap.zig(), as follows:

leap.zig(dat1,type='both',standardize=T,
         xlim=c(3,18),ylim=c(-100,700),error.bars=F,
         main="Enrichment \n(0 vs. 27.2)",vectors=T,raw.points = F,legend=T)

Other comparisons can also be made, by saving the output of one or more vector plots, which then can be drawn side-by-side, or on top of each other.

To demonstrate this, we can process data for all of the different nitrate addition treatments at once, using code very similar to the previous section:

data2<- cdr92D %>% filter(NTrt != 9)

group.vars<-c('Plot')
treat.vars<-c('NTrt','NAdd')
grouped.data2 <- data2 %>% group_by_(.dots=c(group.vars,treat.vars))

res2<- pairwise.price(grouped.data2,species="Species",func="Biomass")
pp2<-res2

# retain only comparisons against NTrt.x = 1
pp2<-pp2[pp2$NTrt.x==1,]
pp2<-group.columns(pp2,gps=c(group.vars,treat.vars),drop=T)
head(pp2)

Then we can compare vector plots, either side-by-side:

tmp1 <- pp2 %>% filter(NAdd=="0 5.44")
s1<-leap.zig(tmp1,type='cafe',main="Enrichment \n(0 vs. 5.44)",
             xlim=c(0,20),ylim=c(-100,700),error.bars=F,
             vectors=T,raw.points = F,legend=FALSE)

tmp2 <- pp2 %>% filter(NAdd=="0 27.2")
s2<-leap.zig(tmp2,type='cafe',main="Enrichment \n(0 vs. 27.2)",
             xlim=c(0,20),ylim=c(-100,700),error.bars=F,
             vectors=T,raw.points = F,legend=FALSE,linetype=2)

grid.arrange(s1,s2,nrow=1)

Then on top of each other:

leap.zig(tmp2,type='cafe',main="Comparing different nitrate levels",
         xlim=c(0,20),ylim=c(-100,700),
         error.bars=F,vectors=T,raw.points = F,legend=FALSE,
         add=TRUE,old.plot=s1,linetype=2)

Statistical comparisons

We have also designed a suite of statistical tests that can be run on each of the components behind the vectors in the visualizations we just explored. At the simplest, these depend on comparing two distributions for each component, such as the CDE term. The first distribution comes from the set of all pairwise comparisons of control-control communities, while the second comes from the control-treatment pairs. Currently, we use parametric tests to determine whether these distributions differ in terms of their means and variances.

To provide an example, let us first examine the BEF decomposition for the Disturbance data. In terms of vector plots, we saw:

s1<-leap.zig(dat1.ctrl,type='bef',main="Enrichment \n(0 vs. 0)",
             xlim=c(5,25),ylim=c(-10,700),error.bars=F,
             vectors=T,raw.points = F,legend=FALSE)
s2<-leap.zig(dat1,type='bef',main="Enrichment \n(0 vs. 27.2)",
             xlim=c(5,25),ylim=c(-10,700),error.bars=F,
             vectors=T,raw.points = F,legend=FALSE)
grid.arrange(s1,s2,nrow=1)

When we run the statistical tests, we see:

test.partitions(pp1,type='bef',treat.var = 'NAdd',control = '0 0',print=F,plot=F)

This function returns a table of statistical results. Within the table, the first column specifies the variable (vector component) being tested. Reading across the columns left to right, we find.

In this case, we see that there are significant changes in.

NOTE: With these analyses, it is quite easy to end up running tests based on distributions composed of a large number of individual values (essentially, the number of possible pairwise comparisons can grow large quite easily). In many situations this endows our statistical tests with the power to detect even quite small effects as significant. It is very important to pay attention to effect sizes as a result of this.

As an aid to interpretation, we can run this same analysis and produce an optional visual tool, using the option plot=T:

test.partitions(pp1,type='bef',treat.var = 'NAdd',control = '0 0',print=F,plot=T)

Similar results can be obtained invoking different decompositions of change in ecosystem function (e.g., CAFE, 5-part Price), as follows:

test.partitions(pp1,type='cafe',treat.var = 'NAdd',control = '0 0',print=F,plot=F)

test.partitions(pp1,type='price',treat.var = 'NAdd',control = '0 0',print=F,plot=F)

NOTE: I have included estimates for slopes/magnitudes for the CAFE components only. We are still trying to understand if these alternate parameterizations of vectors offer any additional useful interpretations over the raw X and Y components of each vector. If they prove useful, the BEF and Price code can be extended to match.

CAUTION: for compatability with the vector plots, these statistical tests should be standardized (or not standardized), depending on earlier choices, using the flag: standardize=T.

NOTE: standardizing values has some interesting effects on control-control components. In several cases, values that we expect to be centered on zero due to symmetry actually shift away from zero. This is mathematically correct, if initially unintuitive, and arises from taking the mean of often highly skewed distributions. Whether this is desirable or avoidable is an open question.


Global, multivariate statistical analysis

The following examples show methods of analysis developed early in 2016, prior to our first sCAFE meeting. They still have significant utility, but are not being actively developed or applied at present.

The general idea is that rather than running a barrage of univariate tests on different components of various partitions, we can perform a single global test of the significance of treatment differences. This depends on calculating the distance between pairs of communities based on their signatures in multidimensional space (5-part Price, CAFE, BEF).

Generating distance matrix (5-part and 3-part)

So, we can use the 5-part (SRE.L, SRE.G, SCE.L, SCE.G, CDE) or CAFE (SL, SG, CDE) Price equation decomposition to generate a distance matrix between sets of community pairs that we have compared with the Price equation. I have written a function, get.dist.mats() that can take the output of pairwise.price(), after removing unnecessary comparisons, and create these distance matrices.

CAUTION: It is easy to unintentionally ask R to create an enormous distance matrix, so think about the size of your data set before running this command.

# Pull out a subset of data to demonstrate this for.
head(pp1)

# size of distance matrix:
nrow(pp1)^2

Run get.dist.mats() to calculate the distance matrices for this data set.

res0<-get.dist.mats(pp1)

# The output has three different components:
names(res0)

head(res0$covars)
res0$dist3[1:5,1:5]
res0$dist5[1:5,1:5]

The resulting structure, res0 has three important components. The first is a data frame of experimental covariates, keeping track of the site and replicate of X and Y community pairs, as well as the corresponding values of the nitrate treatments. The rows of this data frame will match the rows of the distance matrices. Two distance matrices are generated, based on either the 5-part or 3-part versions of the Price equation partition.

Running a permanova analysis

Now that we have the distance matrices in hand, we can use a permutational manova approach to examine the effect of the nitrate treatments. This treatment will have 2 levels, including comparisons within treatment (0 vs. 0 nitrate), and across treatments (0 vs. 27.7 gm nitrate). Note that I removed the (27.2 vs. 0 nitrate) pairs because we want to be able to detect the direction of any treatment effect that may exist.

table(res0$covars$NAdd)

The permutational MANOVA function we have selected is adonis(), from the vegan package. It takes the total variation between pairs in the distance matrix, and sequentially apportions it between treatment effects and residual variation. An approximate p-value is associated with each treatment based on random permutations of rows/columns of the distance matrix, effectively scrambling up treatment groupings. For large matrices, this can be quite computationally demanding. The call to the adonis() function includes an option to run calculations in parallel - I have set this option to 5 because I have a bunch of cores on my laptop. Change this value as necessary to suit your hardware.

Here is an example of running adonis() for the 5-part and 3-part partitions. Within the command, we provide a distance matrix on the left side of the formula (~) and one or more treatment variables on the right side, just as we would in a regular linear (lm()) model. We also provide the corresponding data frame in which covariates can be found. Here goes:

a1<-adonis(res0$dist5 ~ NAdd,data=res0$covars,parallel = 5)
a1

a3<-adonis(res0$dist3 ~ NAdd,data=res0$covars,parallel = 5)
a3

In both cases, we see that the nitrate treatment has a significant effect (p <= 0.001), and a reasonable R2 (~0.33). The default number of permutations is 999, so the smallest p-value possible is 0.001 (unless the number of permutations is increased). Based on our preliminary analysis, it is often the case with Price analyses that we work with a large sample size, which gives us really high power - p-values are almost always significant. It's likely going to be more important for us to pay attention to effect sizes of treatments, using metrics such as the R2 values reported by adonis().

It's worth noting that adonis() can return significant results for two reasons: a) Treatment groups may have different means (centroids in multivariate space), or b) Treatment groups may have different dispersions (levels of spread in multivariate space), but similar means, or both. There is also a test available for this multivariate dispersion:

b1<-betadisper(as.dist(res0$dist5),as.factor(res0$covars$NAdd))
anova(b1) # dispersion test:

Visual representation of permanova analysis

It also helps the interpretation of the adonis() results if we add in some visualizations of the variation in Price equation components across treatments.

This first graphic looks at the marginal distributions of Price equation components across treatments:

# extract relevant information from res0 and smoosh it into long-form structure:
dat4<-melt(res0$covars[,c('NAdd','SRE.L','SRE.G','SCE.L','SCE.G','CDE')],
           id.vars=c('NAdd'))

# Tidy the data for plotting
dat4$variable<-factor(dat4$variable,levels=c('SRE.L','SCE.L','CDE','SRE.G','SCE.G'))
dat4$NAdd<-factor(dat4$NAdd)

# Generate plot of marginal distributions
b3<-ggplot(dat4,aes(y=value,x=NAdd,variable))
b3<-b3+geom_hline(yintercept=0)+
#  geom_violin(aes(fill=NAdd),adjust=1.2)+
  geom_boxplot(aes(fill=NAdd))+
  facet_wrap(~variable,scales = "free",nrow=2)+
  scale_x_discrete()+
  theme(legend.position=c(0.8,0.15),
        axis.text.x=element_blank())
b3

These are essentially the marginal distributions we tested earlier in this document using test.partitions().

Another way of looking at these results is to run a PCoA (analogous to the adonis() test), and plot the corresponding ellipses on the first set of important axes.

# run PCoA and look at dispersion
b1<-betadisper(as.dist(res0$dist5),as.factor(res0$covars$NAdd))
b2<-as.data.frame(b1$vectors)
b2$NAdd<-b1$group
wts<-(b1$eig/sum(b1$eig))

# panels by NAdd treatment
e1<-ggplot(b2,aes(x=PCoA1,y=PCoA2,NAdd))
e1<-e1+geom_point(aes(colour=NAdd),alpha=0.6)+
  stat_ellipse(aes(fill=NAdd),geom="polygon",level=0.95,alpha=0.2)+
  scale_x_continuous(paste("PCoA1 (",100*round(wts[[1]],2),"%)",sep=""))+
  scale_y_continuous(paste("PCoA2 (",100*round(wts[[2]],2),"%)",sep=""))+
  ggtitle("Nitrate treatment")
e1

Finally, it is worth noting that these techniques can be extended to much more complicated models and combinations of treatments, including discrete or continuous covariates, as well as interactions between covariates. The plotting gets more complex, and the calculations take longer, but in principle everything works.


Final Thoughts

So ends this tutorial on applying the Price equation partition to community data sets, using R functions. Additional documentation on the functions in the Price_FUNCTIONS_050916.R file is actively being written, and appears in the comments of that file.


Wishlist

There are a whole bunch of avenues and tasks that could be extended (and hopefully will be over the coming months). Here is a partial list:

  1. Plotting
  2. Shift current code into true object-oriented approach, defining classes for bef, cafe, & price data sets, and associated plotting and testing methods?
  3. Generalize code to allow for facetting of ggplots over multiple levels of a treatment variable, rather than the current approach of pasting together complete graphs
  4. Set up an option to draw individual vector paths for specific pairs of communities.

  5. Analysis

  6. Check the behavior of plotting and analysis functions for un-standardized function values, possible bugs.
  7. Add nonparametric tests to test.partitions()
  8. Extend tools & existing code stubs to account for more complex experimental designs, including multiple treatments, interaction effects, etc.

  9. Simulation model



ctkremer/priceTools documentation built on May 28, 2019, 7:49 p.m.