Travis-CI Build Status

Introduction

This is an R package for the bromeliad working group rainfall experiment. Our intention is to create a set of tools that facilitate all steps of the process:

At present the package allows each group of authors to obtain the datasets they will need to begin their analysis.

Please open an issue if there is a feature you would like to see, or if you discover an error or bug!

Installation

bwgtools can be installed directly from Github using devtools:

install.packages("devtools") # if you don't have devtools
library(devtools)
install_github("SrivastavaLab/bwgtools", dependencies = TRUE)

Accessing Dropbox

bwgtools uses rdrop2 to access your Dropbox account, then uses readxl to download the data and read it directly into R. When you first run library(bwgtools) you will see the following message:

library(bwgtools)

Important Note: Protect your account!

In Dropbox: by default, drop_auth() saves your Dropbox login to a file called .httr-oauth. Of course, we don't want this shared with everyone in our Dropbox folder, as they would then be able to access your personal Dropbox account! Therefore, we set cache=FALSE. This will require us to re-authenticate every time we want to download fresh data. This should be a quick and painless process, especially if you are already logged in on your computer. However, bwgtools should work from any computer connected to the internet, provided that you have your Dropbox username and password.

Working outside of Dropbox: running drop_acc() or drop_auth() will create a file called .httr-oauth in your directory, which will contain your login credentials for Dropbox. Remember to add this file to your .gitignore if you are using git. For more information, see ?rdrop2::drop_auth.

Reading a single sheet

Once you have authenticated with Dropbox, you can read data directly into R. To obtain a single tab (for example, the "leaf.waterdepths" tab) for a single site (for example, Macae), use the function read_sheet_site:

macae <- read_site_sheet("Macae", "leaf.waterdepths")

knitr::kable(head(macae))

Working offline

The first argument to this function can either be the name of a site ("Macae", in the example above), or a path to the location of the file on your hard drive. For example, on my (Andrew's) computer, the path to Dropbox is:

../../../Dropbox

So I could read in my local copy of the data like this:

macae <- read_site_sheet("../../../Dropbox/BWG Drought Experiment/raw data/Drought_data_Macae.xlsx", "leaf.waterdepths")

This is an example of a relative path; the symbol .. means "one directory above my current location". This is the relative path from the directory where I am developing bwgtools to the directory where the Macae data is stored, on my machine. There is more information about relative paths here.

To save typing, I've made a convenience function that fills in the relative path for you. It requires the name of the sheet you want, and the path from your current working directory to the full Dropbox folder:

macae <- read_site_sheet(offline("Macae", default.path = "../../../Dropbox/"), "leaf.waterdepths")

## or, equivalently:
library(magrittr)

macae <- "Macae" %>% 
  offline(default.path = "../../../Dropbox/") %>%  ## use your own path
  read_site_sheet(sheetname = "leaf.waterdepths")

PLEASE NOTE: the major disadvantage of this approach is that your local version of the data may be behind the official version on the Dropbox website. To be safe, use the online version whenever you can.

Reading multiple sheets

For all our analyses, we will need to collect data from all sites. ; we can also load and combine the same tab across all sites, with a single function: combine_tab(). This function has two arguments: the first is a vector of all the site names (as spelt in the file names in the raw data/ folder). The second is the name of the sheet to read (as above). For example, to obtain the "site.info" tab for all sites, use the following command:

library(dplyr)
library(knitr)
c("Argentina", "French_Guiana", "Colombia",
  "Macae", "PuertoRico","CostaRica") %>%
  combine_tab(sheetname = "site.info") %>% 
  head %>% 
  kable

combine_tab() does a little bit more than simply combine the output of read_site_sheet(). It also checks the names of the datasets before combining, creates unique bromeliad id numbers, and reshapes the invertebrate data from "wide" to "long" format. Here is a quick explanation of each of these modifications in turn:

bromeliad ids

Many sites have simply numbered their bromeliads, meaning that there are several bromeliads labelled "1", each in a different site. This is not an error on the part of researchers, however it is a potential danger when combining datasets. To make the bromeliad identification numbers unambiguous, I have combined the site name and the bromeliad name into a new variable, "site_brom.id":

phys_data <- c("Argentina", "French_Guiana", "Colombia",
  "Macae", "PuertoRico","CostaRica") %>%
  combine_tab(sheetname = "bromeliad.physical")

kable(phys_data[1:6, 1:3])

"wide" to "long" format

Most of the tabs have a standard shape -- i.e. a fixed number of columns and rows. The exception is two tabs that contain insect data: bromeliad.initial.inverts and bromeliad.final.inverts. In order to combine data from these datasets, it is necessary to convert them to "long" format: i.e., to "gather" all insect columns into only three: one for the name of the species (the former column header) one for abundance, and another for biomass:

invert_data <- c("Argentina", "French_Guiana", "Colombia",
  "Macae", "PuertoRico","CostaRica") %>%
  combine_tab(sheetname = "bromeliad.final.inverts")

kable(invert_data[1:6, ])

functional groups -- abundance and biomass.

Now that we can obtain data on all the invertebrates found at the end of the experiment, we can calculate the abundance and biomass of different functional groups by combining species that are in the same category. There are two levels of groups into which we can group species:

|pred_prey |func.group| |:------------|:---------| |predator |engulfer | | |piercer | |prey |scraper | | |gatherer | | |piercer | | |shredder |

Accessing functional traits

In order to do this, we will need to access data about all the species ever observed in bromeliads, combining their trait data with their BWG nicknames (the column headers from bromeliad.final.invert, now in the column species in the output of combine_tab()). Formerly, this data was kept in an excel sheet called "Distributions.organisms.correct.xls". Now, several of us have embarked on a project to improve this data and fill in blanks. The result of these efforts (which are still ongoing, though mostly complete) is this .tsv file. Eventually, it will be added to the master database that the BWG is constructing. In the meantime, it can be downloaded directly from github using get_bwg_names() :

bwg_names <- get_bwg_names()

kable(head(bwg_names))

Merging and plotting functional trait data

The first step in combining the functional traits and the observed insect data is to match insect names in both datasets. This is easily accomplished by the base function merge() or by dplyr::left_join(). For convenience, bwgtools::merge_func() does the latter for you:

### merge with functional groups
invert_traits <- merge_func(invert_data, bwg_names)

kable(head(invert_traits))

Now, we need to calculate the total abundance and biomass for every group defined by site, site_brom.id, and either pred_prey or func.group. We can write a function that does this for us, and allows us to control the groups that will be summed together. Using dplyr's non-standard evaluation we can define the groups.

REQUEST for feedback: the syntax here is admittedly a bit strange. Should I just write little convenience functions that perform these tasks? sum_functional() and sum_trophic() for example?

For example, to summarize by functional group:

#4 summarize by functional group
func_groups <- sum_func_groups(invert_traits,
                               grps = list(~site,
                                           ~site_brom.id,
                                           ~func.group))
kable(head(func_groups))

This is a convenient format for plotting:

library(ggplot2)
## functional group abundance
func_groups %>%
  ggplot(aes(x = as.factor(func.group), y = total_abundance)) +
  geom_point(position = position_jitter(width = 0.25), alpha = 0.5) +
  stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 0.6) +
  facet_wrap(~site, scales = "free_y") +
  ggtitle("Functional group abundance")

To summarize by trophic level group, simply switch ~func.group to ~pred_prey:

predprey <- sum_func_groups(invert_traits,
                               grps = list(~site,
                                           ~site_brom.id,
                                           ~pred_prey))

predprey %>%
  ggplot(aes(x = as.factor(pred_prey), y = total_abundance)) +
  geom_point(position = position_jitter(width = 0.25), alpha = 0.5) +
  stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 0.6) +
  facet_wrap(~site, scales = "free_y") +
  ggtitle("Trophic level abundance")

Hydrology -- based on rainfall schedule

under construction right now the only measure I'm working on is derived from the number of consecutive dry days. We can use the metric "longest period of consecutive dry days", and/or quantify their distribution in some other way

Hydrology -- based on water depths

We have daily water measurements for some sites, and from these we will be able to calculate several measures of "hydrological stability". First, we obtain the leaf.waterdepths data in the usual way:

leafwater <- c("Argentina", "French_Guiana", "Colombia",
               "Macae", "PuertoRico","CostaRica") %>%
  combine_tab("leaf.waterdepths")

We also need two other tabs: site.info and bromeliad.physical. The former states the beginning and end of the experiment, and the latter the block for each bromeliad. These are necessary pieces of information to calculate the date at which each bromeliad was placed in the field and removed from the field. We need this information because some groups did not measure water at the beginning (Costa Rica) or did so irregularly (Costa Rica, Argentina). Thus we need to fill in missing days.

sites <- c("Argentina", "French_Guiana", "Colombia",
           "Macae", "PuertoRico","CostaRica") %>%
  combine_tab("site.info")

phys <- c("Argentina", "French_Guiana", "Colombia",
          "Macae", "PuertoRico","CostaRica") %>%
  combine_tab("bromeliad.physical")

We can obtain all the hydro variables with one compound function: hydro_variables():

hydro <- hydro_variables(waterdata = leafwater,
                         sitedata = sites,
                         physicaldata = phys)
kable(head(hydro))

This function should "just work". As you can see, it is performing one check (removing any groups which have all NA records) and performs two joins (to correct the errors discussed above).

It seems that the centre leaf is not like the others. Therefore by default the fucntion removes it. If you want it anyway, set rm_centre = TRUE (Note the Canadian spelling).

hydro2 <- hydro_variables(waterdata = leafwater,
                         sitedata = sites,
                         physicaldata = phys, rm_centre = FALSE)
kable(head(hydro2))

The default behaviour is to calculate all these metrics at the level of the leaf well, the same scale at which the measurements were taken. If you want you can obtain these same calculations AFTER first averaging water depths across all leaves of a plant, within each day. That is, we can make these same calculations at the scale of the bromeliad:

hydro3 <- hydro_variables(waterdata = leafwater,
                         sitedata = sites,
                         physicaldata = phys, aggregate_leaves = TRUE)
kable(head(hydro3))

Licence

We release the code in this repository under the MIT software license. see text of license here



SrivastavaLab/bwgtools documentation built on May 9, 2019, 1:54 p.m.