knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

Overview

MATSS is a package for conducting Macroecological Analyses of Time Series Structure. We designed it to help researchers quickly get started in analyses of ecological time series, and to reinforce and spread good practices in computational analyses.

We provide functionality to:

Installation

You can install MATSS from github with:

# install.packages("remotes")
remotes::install_github("weecology/MATSS", build_opts = c("--no-resave-data", "--no-manual")))

And load the package in the typical fashion:

library(MATSS)

Example Research Compendium

One of the best ways to get started is to create a research compendium. An auto-updating example is visible at https://github.com/weecology/MATSSdemo

To get started, identify the location and name for your compendium. For example, ~/MATSSdemo will put the compendium inside your home directory (the ~ location), with the package name "MATSSdemo". (Note that package names can only contain ASCII letters, numbers, and "." and have to start with a letter.)

create_MATSS_compendium("<path>")

Compendium Creation Steps

Running this code will perform the following operations:

Running the Code

After creating the new project, the readme will contain further instructions to run the code. We summarize briefly here:

  1. The compendium exists as an R package and needs to be installed first.
  2. R needs to be restarted.
  3. The analysis script in analysis/pipeline.R can be run to perform the analysis and generate the report.
  4. The compiled report at analysis/report.md can be viewed.

For further details about how the code within the template project works, see the below guide to interacting with the datasets, the drake workflow package, and our tools for building reproducible analyses.

Data

Packaged datasets

Several datasets are included with this package - these can be loaded individually using these specific functions, and require no additional setup.

get_cowley_lizards()
get_cowley_snakes()
get_karoo_data()
get_kruger_data()

Configuring download locations:

Other datasets require downloading. To facilitate this, we include functions to help configure a specific location on disk. To check your current setting:

get_default_data_path()

and to configure this setting (and then follow the instructions therein):

use_default_data_path("<path>")

Downloading datasets:

To download individual datasets, call install_retriever_data() with the name of the dataset:

install_retriever_data("veg-plots-sdl")

To download all the datasets that are currently supported (i.e. with associated code for importing and formatting):

download_datasets()

Preprocessing datasets:

We tap into several collections of datasets in MATSS, so it is useful to do some preprocessing to split the raw database files into separate datasets. These databases are: BBS (the North American Breeding Bird Survey) BioTIME (ecological assemblages from the BioTIME Consortium)

Processing these databases are necessary before loading individual datasets in.

prepare_datasets() # wrapper function to prepare all datasets
# prepare_biotime_data()
# prepare_bbs_ts_data()

Working with Drake

We designed MATSS to build off of the workflow package drake for computational analyses. Thus, it can be helpful to have a general understanding of how to use drake.

Basic Workflow

The basic apporach to using drake is:

Provided Helper Functions

We provide several functions to help construct plans:

Usage of these functions is demonstrated in the template R script generated from create_MATSS_compendium().

Example

library(drake)
library(dplyr)

# define the plan
plan <- drake_plan(data_1 = mtcars, 
                   data_2 = iris, 
                   my_model = lm(mpg ~ disp, data = data_1), 
                   my_summary = data_2 %>%
                       group_by(Species) %>%
                       summarize_all(mean))

# run the plan
make(plan)

# check resulting objects
readd(my_model)
readd(my_summary)

Running Drake Plans

Drake plans are run by calling make(). This does several things. First it checks the cache to see if any targets need to be re-built, and then it proceeds to build all the targets, in some order that accounts for the dependencies between targets. (e.g. an analysis target that depends on a dataset target to be processed)

The manual has more information about how Drake stores its cache and how Drake decides to rebuild targets.

Note that if there are file inputs, it is important that they are declared explicitly using e.g. file_in(), knitr_in(), and file_out(). This enables Drake to check if those files are changed and to rebuild targets that depend on the files if needed. Otherwise Drake will treat them as fixed strings.

plan <- drake_plan(data = read.csv("some_data.csv"))
make(plan)

# make some changes to `some_data.csv`
make(plan) # will NOT rebuild the `data` target
plan <- drake_plan(data = read.csv(file_in("some_data.csv")))
make(plan)

# make some changes to `some_data.csv`
make(plan) # will rebuild the `data` target

References



weecology/MATSS documentation built on May 15, 2020, 7:03 p.m.